Building a Flexible AI Model Selection System: Gateways, Providers, and Smart Routing
Building a Flexible AI Model Selection System: Gateways, Providers, and Smart Routing
When building AI-powered applications, you quickly face a fundamental question: how do you let users choose their AI models while maintaining control, flexibility, and cost efficiency?
We needed a system that could:
- Support multiple AI providers (OpenAI, Anthropic, Groq, DeepSeek, etc.)
- Route through gateways (Vercel AI Gateway, OpenRouter) for centralized monitoring
- Allow direct provider access for users with their own API keys
- Respect user preferences while enforcing organization policies
- Gracefully fall back when configurations are missing
- Handle edge cases like models with nested naming conventions
This post walks through how we solved this problem.
Architecture Overview
Our AI model selection system has three main layers:
Gateways vs Direct Providers: Understanding the Difference
AI Gateways
Think of an AI Gateway as a smart proxy that sits between your application and AI providers. We primarily use Vercel AI Gateway.
How gateway routing works:
The gateway uses a simple prefix system to route requests:
// These all go through the same gateway endpoint:
gateway("openai/gpt-4o") // → Routes to OpenAI
gateway("anthropic/claude-3") // → Routes to Anthropic
gateway("groq/llama-3.3-70b") // → Routes to Groq
When to use a gateway:
- You want unified monitoring across providers
- You need centralized cost tracking
- You want automatic provider fallbacks
- You prefer a single API key for all providers
Direct Providers
Direct providers connect straight to the AI service without an intermediary:
When to use direct providers:
- Your organization has direct contracts with providers
- You need provider-specific features not available through gateways
- You want lower latency (no gateway hop)
- You need to use the provider's native API exactly
The Model Selection Hierarchy
One of our key innovations is a cascading preference system. When a request comes in, we check multiple levels to determine which model to use:
Priority 1: Organization Enforced Preference
When an organization sets enforceForMembers: true, ALL team members must use that model. This cannot be overridden by user preferences.
Use Case: Compliance requirements - "All chat interactions must use Claude for safety reasons"
Priority 2-3: Agent-Specific Preferences
Users and organizations can set preferences for specific agents. For example: "Use GPT-4o for the Code Review Agent"
Priority 4-5: Task Type Preferences
Preferences can be set by task category:
- SIMPLE: Quick tasks like title generation
- COMPLEX: Document generation, detailed analysis
- REASONING: Tasks requiring deep logical thinking
- CHAT: Interactive conversations
- TOOL_CALLING: Tasks requiring function/tool execution
Priority 6-7: Defaults
System defaults and hardcoded fallbacks ensure the system always works, even with no configuration.
Provider Mappings: Same Model, Multiple Routes
Here's where it gets interesting. The same logical model (like "GPT-4o") can be accessed through different providers:
When we select a model, we also need to pick which route to use. Our priority:
- User's explicit override (if they specified a provider for this model)
- User's default provider (their configured gateway/direct provider)
- First available mapping (fallback)
Real-World Scenarios
Let me walk through some concrete examples:
Scenario 1: Simple Direct Access
Setup: User has OpenAI API key configured directly.
Result: Direct API call to api.openai.com
Scenario 2: Gateway with Multiple Providers
Setup: User has Vercel AI Gateway with OpenAI, Anthropic, and Groq enabled.
Result: Call through Vercel Gateway to Groq
Scenario 3: The Tricky Nested Prefix Case
This one caused us some headaches! Groq hosts some OpenAI open-source models with interesting names:
Groq's Model: openai/gpt-oss-120b
(This is an OpenAI model running on Groq's infrastructure)
The Problem:
The Solution:
The code:
function normalizeModelForGateway(modelName: string): string {
// Handle "groq/openai/gpt-oss-*" pattern
if (modelName.startsWith("groq/") && modelName.includes("/", 5)) {
return modelName.slice(5); // Remove "groq/"
}
return modelName;
}
Scenario 4: Organization Enforcement
Setup: Organization mandates Claude for all chat (compliance requirement).
Result: Claude used regardless of user preference
Scenario 5: Same Provider via Multiple Routes (Multi-Gateway)
Setup: User has Groq configured as a direct provider AND through multiple gateways. This is common when organizations want flexibility between direct access and gateway features.
User Configuration:
├── Direct Provider: GROQ (API Key: gsk_xxx, isDefault: false)
├── Vercel AI Gateway (API Key: vck_xxx, isDefault: true, enabledProviders: [groq, openai])
└── Cloudflare AI Gateway (API Key: cf_xxx, isDefault: false, enabledProviders: [groq])
Model: llama-3.3-70b-versatile
Available Mappings:
├── GROQ → "llama-3.3-70b-versatile"
├── VERCEL_GATEWAY → "groq/llama-3.3-70b-versatile"
└── CLOUDFLARE_AI → "@cf/meta/llama-3.3-70b-instruct"
How does routing work?
The key is the isDefault: true flag:
- System queries for user's default provider config
- Vercel Gateway has
isDefault: true→ It's selected aspreferredProvider - Model mapping for VERCEL_GATEWAY is used:
groq/llama-3.3-70b-versatile - Request goes: App → Vercel Gateway → Groq API
Overriding the default route:
Users can force a specific route using overrideProvider in their model preference:
-- Force direct Groq for llama models
INSERT INTO user_model_preference
(user_id, task_type, model_id, override_provider)
VALUES
('user123', 'CHAT', 'llama-3.3-70b', 'GROQ');
Now the request goes: App → Groq API (bypassing gateway)
Decision Matrix:
| Default Provider | Override | Route Used | Path | |-----------------|----------|------------|------| | VERCEL_GATEWAY | (none) | Vercel Gateway | App → Vercel → Groq | | VERCEL_GATEWAY | GROQ | Direct Groq | App → Groq | | VERCEL_GATEWAY | CLOUDFLARE_AI | Cloudflare | App → Cloudflare → Groq | | GROQ | (none) | Direct Groq | App → Groq |
Why have multiple routes to the same provider?
| Route | Use Case | |-------|----------| | Direct Groq | Lower latency, no gateway overhead | | Via Vercel Gateway | Centralized logging, cost tracking, fallbacks | | Via Cloudflare AI | Edge caching, geographic routing |
Scenario 6: Agent with Task-Specific Provider Override
Setup: User has both Vercel AI Gateway (default) and OpenAI Direct configured. User sets OpenAI Direct as the override provider for TOOL_CALLING tasks.
The Key Innovation:
When a user sets an overrideProvider for a specific task type, the system now:
- Bypasses token exchange - Instead of using the default provider's token, passes the override provider's API key directly
- Uses the correct gateway URL - Routes to the override provider's endpoint (e.g.,
api.openai.cominstead of Vercel Gateway) - Falls back gracefully - If override provider has no API key configured, falls back to default provider
Database configuration:
-- User's provider configs
INSERT INTO user_cloud_provider_config VALUES
('config1', 'user123', 'VERCEL_GATEWAY', TRUE, TRUE, '{"apiKey": "vck_xxx"}'),
('config2', 'user123', 'OPENAI_DIRECT', FALSE, TRUE, '{"apiKey": "sk-xxx"}');
-- Task-specific override for TOOL_CALLING
INSERT INTO user_model_preference
(user_id, task_type, model_id, override_provider)
VALUES
('user123', 'TOOL_CALLING', 'gpt-4o', 'OPENAI_DIRECT');
Result:
- CHAT tasks → Vercel Gateway (default)
- TOOL_CALLING tasks → OpenAI Direct (override)
How Agent Provider Routing Works
The system has two modes for providing AI credentials to agents:
Mode 1: Token Exchange (Default Provider)
When there's no override, agents use secure token exchange:
Mode 2: Direct Pass-Through (Override Provider)
When user has an override provider, the API key is passed directly:
Why Two Modes?
| Mode | When Used | Benefit | |------|-----------|---------| | Token Exchange | Default provider, no override | Secure - agents never see raw API keys | | Direct Pass-Through | Override provider set | Respects user's task-specific routing preference |
The direct pass-through is necessary because the token exchange endpoint only knows about the default provider. When users configure different providers for different task types (e.g., OpenAI Direct for tool calling), the override provider's credentials must be passed explicitly.
Implementation Deep Dive
The Model Selection Function
async function selectModelDynamic(
options: {
taskType: "SIMPLE" | "COMPLEX" | "REASONING" | "CHAT" | "TOOL_CALLING";
complexity?: "simple" | "medium" | "complex";
requiresToolCalling?: boolean;
preferredProvider?: string;
},
context: {
userId: string;
organizationId?: string;
agentId?: string;
}
): Promise<ModelSelectionResult> {
// 1. Check database for effective preference
const preference = await getEffectiveModelPreference(
context.userId,
context.organizationId,
options.taskType,
context.agentId
);
if (preference?.model) {
// 2. Select the right provider mapping
const mapping = selectProviderMapping(
preference.model.providerMappings,
preference.overrideProvider || options.preferredProvider
);
return {
modelId: preference.model.id,
providerModelId: mapping.providerModelId,
provider: mapping.provider,
source: preference.source
};
}
// 3. Fall back to system default
const defaultModel = await getTaskDefaultModel(options.taskType);
if (defaultModel) {
return { /* default model result */ };
}
// 4. Last resort: hardcoded fallback
return getHardcodedFallback(options.taskType);
}
The Model Instantiation Function
function getModel(
modelName: string,
context?: { apiKey?: string }
): LanguageModel {
// 1. Normalize for gateway (handle nested prefixes)
const normalized = normalizeModelForGateway(modelName);
// 2. Route based on configuration
if (context?.apiKey) {
// Custom API key provided (per-user or per-org)
const gateway = createGateway({ apiKey: context.apiKey });
return gateway(formatModelName(normalized));
}
if (globalGatewayKey) {
// Use global gateway from environment
return globalGateway(formatModelName(normalized));
}
// 3. Fall back to direct providers
const provider = extractProvider(modelName);
switch (provider) {
case "groq": return groqProvider(extractModelName(modelName));
case "openai": return openaiProvider(extractModelName(modelName));
case "anthropic": return anthropicProvider(extractModelName(modelName));
}
}
Database Design
Our schema supports the full flexibility of the system:
Default Models by Task Type
We've carefully selected defaults optimized for each task type:
| Task Type | Default Model | Provider | Why | |-----------|--------------|----------|-----| | SIMPLE | llama-3.1-8b-instant | Groq | Ultra-fast inference | | COMPLEX | llama-3.3-70b-versatile | Groq | Balance of speed and capability | | REASONING | deepseek-r1-distill-llama-70b | Groq | Specialized for deep thinking | | CHAT | llama-3.3-70b-versatile | Groq | Natural conversations | | TOOL_CALLING | openai/gpt-oss-120b | Groq | Reliable function calling | | EMBEDDING | text-embedding-3-small | OpenAI | Industry standard | | IMAGE | dall-e-3 | OpenAI | Best quality |
Lessons Learned
1. Gateway Model Names Are Tricky
Gateways use provider/model format, but what happens when the model name itself contains a /? We learned this the hard way with Groq's openai/gpt-oss-120b model.
Solution: Normalize model names before sending to gateway.
2. Preference Hierarchies Need Clear Priority
Without clear priority rules, you get conflicts. "User wants X, org wants Y, system default is Z" - who wins?
Solution: Document and implement a strict priority order. Organization enforcement trumps everything.
3. Fallbacks Are Essential
What happens when the database is down? When a user hasn't configured anything? You need graceful degradation.
Solution: Multiple fallback layers - system defaults → env vars → hardcoded values.
4. Provider Mappings Enable Flexibility
The same model through different providers might have different names, pricing, or capabilities.
Solution: Separate the "what" (model) from the "how" (provider) in your database design.
5. Token Exchange Has Limits
We initially used token exchange for all agent API key resolution. But token exchange only knows about the default provider - it can't respect task-specific overrides.
Problem:
User selects: "GPT-4o via OpenAI Direct" for TOOL_CALLING
Token exchange returns: Vercel Gateway config (default)
Agent routes: GPT-4o through Vercel Gateway (wrong!)
Solution: For override providers, bypass token exchange and pass the decrypted API key directly in the agent's configurable. This allows agents to use the user's task-specific provider preference.
6. Order of Precedence Matters in Unified Servers
Our unified server was always overwriting ai_api_key with the token exchange result, ignoring any directly-passed keys.
The Bug:
// Bad: Always uses token exchange, ignores passed keys
ai_api_key: exchangedCredentials?.apiKey,
The Fix:
// Good: Direct pass-through takes precedence over token exchange
ai_api_key:
configurable.ai_api_key || // Override provider's key (direct)
context.ai_api_key || // From CopilotKit context
exchangedCredentials?.apiKey, // Token exchange (fallback)
Configuration Quick Reference
Environment Variables
# Primary Gateway (recommended)
AI_GATEWAY_API_KEY=vck_xxxxx # Vercel AI Gateway key
# Direct Provider Fallbacks (optional)
OPENAI_API_KEY=sk-xxxxx # Direct OpenAI
ANTHROPIC_API_KEY=sk-ant-xxxxx # Direct Anthropic
GROQ_API_KEY=gsk_xxxxx # Direct Groq
# Fallback Models (override defaults)
FALLBACK_SIMPLE_MODEL=groq/llama-3.1-8b-instant
FALLBACK_COMPLEX_MODEL=groq/llama-3.3-70b-versatile
Conclusion
Building a flexible AI model selection system requires thinking about:
- Multiple access paths (gateways vs direct providers)
- Preference hierarchies (user → org → system)
- Provider abstraction (same model, different routes)
- Edge cases (nested names, missing configs, enforcement)
- Graceful degradation (fallbacks at every level)
The result is a system that gives users control while maintaining organizational oversight and operational reliability.
What's Next?
We're continuing to improve this system:
- ✅ Task-specific provider routing: Users can now route different task types through different providers (e.g., OpenAI Direct for tool calling, Anthropic for reasoning)
- ✅ Agent provider override: LangGraph agents respect user's override provider preferences
- Cost optimization: Automatic routing to cheaper providers for simple tasks
- Latency-based routing: Choose providers based on response time
- A/B testing: Easy model comparison for quality evaluation
- Usage analytics: Per-user and per-org cost tracking
Have questions about implementing something similar? Reach out - we're happy to discuss!
See also: Technical Architecture Documentation
