Building a Flexible AI Model Selection System: Gateways, Providers, and Smart Routing

January 1, 2026

Share:
Building a Flexible AI Model Selection System: Gateways, Providers, and Smart Routing

Building a Flexible AI Model Selection System: Gateways, Providers, and Smart Routing

When building AI-powered applications, you quickly face a fundamental question: how do you let users choose their AI models while maintaining control, flexibility, and cost efficiency?

We needed a system that could:

  1. Support multiple AI providers (OpenAI, Anthropic, Groq, DeepSeek, etc.)
  2. Route through gateways (Vercel AI Gateway, OpenRouter) for centralized monitoring
  3. Allow direct provider access for users with their own API keys
  4. Respect user preferences while enforcing organization policies
  5. Gracefully fall back when configurations are missing
  6. Handle edge cases like models with nested naming conventions

This post walks through how we solved this problem.


Architecture Overview

Our AI model selection system has three main layers:

Loading diagram...

Gateways vs Direct Providers: Understanding the Difference

AI Gateways

Think of an AI Gateway as a smart proxy that sits between your application and AI providers. We primarily use Vercel AI Gateway.

Loading diagram...

How gateway routing works:

The gateway uses a simple prefix system to route requests:

// These all go through the same gateway endpoint:
gateway("openai/gpt-4o")      // → Routes to OpenAI
gateway("anthropic/claude-3") // → Routes to Anthropic
gateway("groq/llama-3.3-70b") // → Routes to Groq

When to use a gateway:

  • You want unified monitoring across providers
  • You need centralized cost tracking
  • You want automatic provider fallbacks
  • You prefer a single API key for all providers

Direct Providers

Direct providers connect straight to the AI service without an intermediary:

Loading diagram...

When to use direct providers:

  • Your organization has direct contracts with providers
  • You need provider-specific features not available through gateways
  • You want lower latency (no gateway hop)
  • You need to use the provider's native API exactly

The Model Selection Hierarchy

One of our key innovations is a cascading preference system. When a request comes in, we check multiple levels to determine which model to use:

Loading diagram...

Priority 1: Organization Enforced Preference

When an organization sets enforceForMembers: true, ALL team members must use that model. This cannot be overridden by user preferences.

Use Case: Compliance requirements - "All chat interactions must use Claude for safety reasons"

Priority 2-3: Agent-Specific Preferences

Users and organizations can set preferences for specific agents. For example: "Use GPT-4o for the Code Review Agent"

Priority 4-5: Task Type Preferences

Preferences can be set by task category:

  • SIMPLE: Quick tasks like title generation
  • COMPLEX: Document generation, detailed analysis
  • REASONING: Tasks requiring deep logical thinking
  • CHAT: Interactive conversations
  • TOOL_CALLING: Tasks requiring function/tool execution

Priority 6-7: Defaults

System defaults and hardcoded fallbacks ensure the system always works, even with no configuration.


Provider Mappings: Same Model, Multiple Routes

Here's where it gets interesting. The same logical model (like "GPT-4o") can be accessed through different providers:

Loading diagram...

When we select a model, we also need to pick which route to use. Our priority:

  1. User's explicit override (if they specified a provider for this model)
  2. User's default provider (their configured gateway/direct provider)
  3. First available mapping (fallback)

Real-World Scenarios

Let me walk through some concrete examples:

Scenario 1: Simple Direct Access

Setup: User has OpenAI API key configured directly.

Loading diagram...

Result: Direct API call to api.openai.com

Scenario 2: Gateway with Multiple Providers

Setup: User has Vercel AI Gateway with OpenAI, Anthropic, and Groq enabled.

Loading diagram...

Result: Call through Vercel Gateway to Groq

Scenario 3: The Tricky Nested Prefix Case

This one caused us some headaches! Groq hosts some OpenAI open-source models with interesting names:

Groq's Model: openai/gpt-oss-120b
(This is an OpenAI model running on Groq's infrastructure)

The Problem:

Loading diagram...

The Solution:

Loading diagram...

The code:

function normalizeModelForGateway(modelName: string): string {
  // Handle "groq/openai/gpt-oss-*" pattern
  if (modelName.startsWith("groq/") && modelName.includes("/", 5)) {
    return modelName.slice(5); // Remove "groq/"
  }
  return modelName;
}

Scenario 4: Organization Enforcement

Setup: Organization mandates Claude for all chat (compliance requirement).

Loading diagram...

Result: Claude used regardless of user preference

Scenario 5: Same Provider via Multiple Routes (Multi-Gateway)

Setup: User has Groq configured as a direct provider AND through multiple gateways. This is common when organizations want flexibility between direct access and gateway features.

User Configuration:
├── Direct Provider: GROQ (API Key: gsk_xxx, isDefault: false)
├── Vercel AI Gateway (API Key: vck_xxx, isDefault: true, enabledProviders: [groq, openai])
└── Cloudflare AI Gateway (API Key: cf_xxx, isDefault: false, enabledProviders: [groq])

Model: llama-3.3-70b-versatile
Available Mappings:
├── GROQ → "llama-3.3-70b-versatile"
├── VERCEL_GATEWAY → "groq/llama-3.3-70b-versatile"
└── CLOUDFLARE_AI → "@cf/meta/llama-3.3-70b-instruct"

How does routing work?

Loading diagram...

The key is the isDefault: true flag:

  1. System queries for user's default provider config
  2. Vercel Gateway has isDefault: true → It's selected as preferredProvider
  3. Model mapping for VERCEL_GATEWAY is used: groq/llama-3.3-70b-versatile
  4. Request goes: App → Vercel Gateway → Groq API

Overriding the default route:

Users can force a specific route using overrideProvider in their model preference:

-- Force direct Groq for llama models
INSERT INTO user_model_preference
  (user_id, task_type, model_id, override_provider)
VALUES
  ('user123', 'CHAT', 'llama-3.3-70b', 'GROQ');

Now the request goes: App → Groq API (bypassing gateway)

Decision Matrix:

| Default Provider | Override | Route Used | Path | |-----------------|----------|------------|------| | VERCEL_GATEWAY | (none) | Vercel Gateway | App → Vercel → Groq | | VERCEL_GATEWAY | GROQ | Direct Groq | App → Groq | | VERCEL_GATEWAY | CLOUDFLARE_AI | Cloudflare | App → Cloudflare → Groq | | GROQ | (none) | Direct Groq | App → Groq |

Why have multiple routes to the same provider?

| Route | Use Case | |-------|----------| | Direct Groq | Lower latency, no gateway overhead | | Via Vercel Gateway | Centralized logging, cost tracking, fallbacks | | Via Cloudflare AI | Edge caching, geographic routing |

Scenario 6: Agent with Task-Specific Provider Override

Setup: User has both Vercel AI Gateway (default) and OpenAI Direct configured. User sets OpenAI Direct as the override provider for TOOL_CALLING tasks.

Loading diagram...

The Key Innovation:

When a user sets an overrideProvider for a specific task type, the system now:

  1. Bypasses token exchange - Instead of using the default provider's token, passes the override provider's API key directly
  2. Uses the correct gateway URL - Routes to the override provider's endpoint (e.g., api.openai.com instead of Vercel Gateway)
  3. Falls back gracefully - If override provider has no API key configured, falls back to default provider

Database configuration:

-- User's provider configs
INSERT INTO user_cloud_provider_config VALUES
  ('config1', 'user123', 'VERCEL_GATEWAY', TRUE, TRUE, '{"apiKey": "vck_xxx"}'),
  ('config2', 'user123', 'OPENAI_DIRECT', FALSE, TRUE, '{"apiKey": "sk-xxx"}');

-- Task-specific override for TOOL_CALLING
INSERT INTO user_model_preference
  (user_id, task_type, model_id, override_provider)
VALUES
  ('user123', 'TOOL_CALLING', 'gpt-4o', 'OPENAI_DIRECT');

Result:

  • CHAT tasks → Vercel Gateway (default)
  • TOOL_CALLING tasks → OpenAI Direct (override)

How Agent Provider Routing Works

The system has two modes for providing AI credentials to agents:

Mode 1: Token Exchange (Default Provider)

When there's no override, agents use secure token exchange:

Loading diagram...

Mode 2: Direct Pass-Through (Override Provider)

When user has an override provider, the API key is passed directly:

Loading diagram...

Why Two Modes?

| Mode | When Used | Benefit | |------|-----------|---------| | Token Exchange | Default provider, no override | Secure - agents never see raw API keys | | Direct Pass-Through | Override provider set | Respects user's task-specific routing preference |

The direct pass-through is necessary because the token exchange endpoint only knows about the default provider. When users configure different providers for different task types (e.g., OpenAI Direct for tool calling), the override provider's credentials must be passed explicitly.


Implementation Deep Dive

The Model Selection Function

async function selectModelDynamic(
  options: {
    taskType: "SIMPLE" | "COMPLEX" | "REASONING" | "CHAT" | "TOOL_CALLING";
    complexity?: "simple" | "medium" | "complex";
    requiresToolCalling?: boolean;
    preferredProvider?: string;
  },
  context: {
    userId: string;
    organizationId?: string;
    agentId?: string;
  }
): Promise<ModelSelectionResult> {

  // 1. Check database for effective preference
  const preference = await getEffectiveModelPreference(
    context.userId,
    context.organizationId,
    options.taskType,
    context.agentId
  );

  if (preference?.model) {
    // 2. Select the right provider mapping
    const mapping = selectProviderMapping(
      preference.model.providerMappings,
      preference.overrideProvider || options.preferredProvider
    );

    return {
      modelId: preference.model.id,
      providerModelId: mapping.providerModelId,
      provider: mapping.provider,
      source: preference.source
    };
  }

  // 3. Fall back to system default
  const defaultModel = await getTaskDefaultModel(options.taskType);
  if (defaultModel) {
    return { /* default model result */ };
  }

  // 4. Last resort: hardcoded fallback
  return getHardcodedFallback(options.taskType);
}

The Model Instantiation Function

function getModel(
  modelName: string,
  context?: { apiKey?: string }
): LanguageModel {

  // 1. Normalize for gateway (handle nested prefixes)
  const normalized = normalizeModelForGateway(modelName);

  // 2. Route based on configuration
  if (context?.apiKey) {
    // Custom API key provided (per-user or per-org)
    const gateway = createGateway({ apiKey: context.apiKey });
    return gateway(formatModelName(normalized));
  }

  if (globalGatewayKey) {
    // Use global gateway from environment
    return globalGateway(formatModelName(normalized));
  }

  // 3. Fall back to direct providers
  const provider = extractProvider(modelName);
  switch (provider) {
    case "groq": return groqProvider(extractModelName(modelName));
    case "openai": return openaiProvider(extractModelName(modelName));
    case "anthropic": return anthropicProvider(extractModelName(modelName));
  }
}

Database Design

Our schema supports the full flexibility of the system:

Loading diagram...

Default Models by Task Type

We've carefully selected defaults optimized for each task type:

| Task Type | Default Model | Provider | Why | |-----------|--------------|----------|-----| | SIMPLE | llama-3.1-8b-instant | Groq | Ultra-fast inference | | COMPLEX | llama-3.3-70b-versatile | Groq | Balance of speed and capability | | REASONING | deepseek-r1-distill-llama-70b | Groq | Specialized for deep thinking | | CHAT | llama-3.3-70b-versatile | Groq | Natural conversations | | TOOL_CALLING | openai/gpt-oss-120b | Groq | Reliable function calling | | EMBEDDING | text-embedding-3-small | OpenAI | Industry standard | | IMAGE | dall-e-3 | OpenAI | Best quality |


Lessons Learned

1. Gateway Model Names Are Tricky

Gateways use provider/model format, but what happens when the model name itself contains a /? We learned this the hard way with Groq's openai/gpt-oss-120b model.

Solution: Normalize model names before sending to gateway.

2. Preference Hierarchies Need Clear Priority

Without clear priority rules, you get conflicts. "User wants X, org wants Y, system default is Z" - who wins?

Solution: Document and implement a strict priority order. Organization enforcement trumps everything.

3. Fallbacks Are Essential

What happens when the database is down? When a user hasn't configured anything? You need graceful degradation.

Solution: Multiple fallback layers - system defaults → env vars → hardcoded values.

4. Provider Mappings Enable Flexibility

The same model through different providers might have different names, pricing, or capabilities.

Solution: Separate the "what" (model) from the "how" (provider) in your database design.

5. Token Exchange Has Limits

We initially used token exchange for all agent API key resolution. But token exchange only knows about the default provider - it can't respect task-specific overrides.

Problem:

User selects: "GPT-4o via OpenAI Direct" for TOOL_CALLING
Token exchange returns: Vercel Gateway config (default)
Agent routes: GPT-4o through Vercel Gateway (wrong!)

Solution: For override providers, bypass token exchange and pass the decrypted API key directly in the agent's configurable. This allows agents to use the user's task-specific provider preference.

6. Order of Precedence Matters in Unified Servers

Our unified server was always overwriting ai_api_key with the token exchange result, ignoring any directly-passed keys.

The Bug:

// Bad: Always uses token exchange, ignores passed keys
ai_api_key: exchangedCredentials?.apiKey,

The Fix:

// Good: Direct pass-through takes precedence over token exchange
ai_api_key:
  configurable.ai_api_key ||    // Override provider's key (direct)
  context.ai_api_key ||         // From CopilotKit context
  exchangedCredentials?.apiKey, // Token exchange (fallback)

Configuration Quick Reference

Environment Variables

# Primary Gateway (recommended)
AI_GATEWAY_API_KEY=vck_xxxxx        # Vercel AI Gateway key

# Direct Provider Fallbacks (optional)
OPENAI_API_KEY=sk-xxxxx             # Direct OpenAI
ANTHROPIC_API_KEY=sk-ant-xxxxx      # Direct Anthropic
GROQ_API_KEY=gsk_xxxxx              # Direct Groq

# Fallback Models (override defaults)
FALLBACK_SIMPLE_MODEL=groq/llama-3.1-8b-instant
FALLBACK_COMPLEX_MODEL=groq/llama-3.3-70b-versatile

Conclusion

Building a flexible AI model selection system requires thinking about:

  1. Multiple access paths (gateways vs direct providers)
  2. Preference hierarchies (user → org → system)
  3. Provider abstraction (same model, different routes)
  4. Edge cases (nested names, missing configs, enforcement)
  5. Graceful degradation (fallbacks at every level)

The result is a system that gives users control while maintaining organizational oversight and operational reliability.


What's Next?

We're continuing to improve this system:

  • Task-specific provider routing: Users can now route different task types through different providers (e.g., OpenAI Direct for tool calling, Anthropic for reasoning)
  • Agent provider override: LangGraph agents respect user's override provider preferences
  • Cost optimization: Automatic routing to cheaper providers for simple tasks
  • Latency-based routing: Choose providers based on response time
  • A/B testing: Easy model comparison for quality evaluation
  • Usage analytics: Per-user and per-org cost tracking

Have questions about implementing something similar? Reach out - we're happy to discuss!


See also: Technical Architecture Documentation