Production AI applications need flexibility. You might want OpenAI's GPT-5 for complex reasoning, Claude for long-context analysis, Gemini for cost-effective batch processing, or OpenRouter for dynamic provider routing. Managing these providers separately creates operational overhead, but Fenic's unified inference framework lets you configure, route, and optimize across providers from a single codebase.
This guide shows you how to configure multiple LLM providers in Fenic, implement rate limiting, switch models dynamically, and build production-grade pipelines that leverage the strengths of different providers.
Why Use Multiple Providers
Different providers excel at different tasks:
- OpenAI models deliver strong reasoning capabilities with models like GPT-5 and the o-series
- Anthropic's Claude handles extended context windows and complex analysis
- Google's Gemini offers fast, cost-effective inference with native file processing
- OpenRouter provides automatic failover and cost/latency-based routing across providers
A single provider limits your options. Multiple providers let you optimize for cost, latency, quality, and reliability per workload.
Setting Up Your First Multi-Provider Configuration
Fenic manages all provider configurations through a SemanticConfig object within your session. Start by configuring multiple language models with their rate limits.
pythonimport fenic as fc from fenic.api.session.config import ( SessionConfig, SemanticConfig, OpenAILanguageModel, AnthropicLanguageModel, GoogleDeveloperLanguageModel ) config = SessionConfig( app_name="multi_provider_app", semantic=SemanticConfig( language_models={ "gpt4": OpenAILanguageModel( model_name="gpt-4.1-nano", rpm=100, tpm=100 ), "claude": AnthropicLanguageModel( model_name="claude-3-5-haiku-latest", rpm=100, input_tpm=100, output_tpm=100 ), "gemini": GoogleDeveloperLanguageModel( model_name="gemini-2.0-flash", rpm=100, tpm=1000 ) }, default_language_model="gpt4" ) ) session = fc.Session.get_or_create(config)
Each model receives a string alias ("gpt4", "claude", "gemini") that you'll use to reference it in semantic operations. The default_language_model determines which model runs when you don't explicitly specify one.
Rate Limiting Across Providers
Provider rate limits vary by API tier. Fenic enforces these limits at the framework level, preventing 429 errors and ensuring reliable pipeline execution.
Provider-Specific Rate Limit Patterns
Different providers use different rate limit structures:
OpenAI and Google use combined token limits:
pythonOpenAILanguageModel( model_name="gpt-4.1-nano", rpm=100, # Requests per minute tpm=10000 # Tokens per minute (input + output) )
Anthropic separates input and output token limits:
pythonAnthropicLanguageModel( model_name="claude-opus-4-0", rpm=100, input_tpm=50000, # Input tokens per minute output_tpm=25000 # Output tokens per minute )
Fenic tracks token usage automatically and throttles requests to stay within your configured limits. This prevents quota exhaustion and ensures predictable pipeline behavior. According to the Fenic 0.5.0 release notes, the framework includes quota handling guardrails that fail fast with clear errors rather than silently retrying.
Switching Models Per Operation
Reference any configured model by its alias in semantic operations. This lets you route different workloads to different providers based on their strengths.
python# Use GPT-4 for complex extraction structured_data = df.select( fc.semantic.extract( fc.col("document"), response_format=ComplexSchema, model_alias="gpt4" ) ) # Use Gemini for cost-effective summarization summaries = df.select( fc.semantic.summarize( fc.col("content"), model_alias="gemini" ) ) # Use Claude for sentiment analysis sentiment = df.select( fc.semantic.analyze_sentiment( fc.col("feedback"), model_alias="claude" ) )
The HN research agent built by Typedef demonstrates this pattern: it uses one model for batch summarization and another for synthesis, letting each provider handle what it does best.
Model Profiles for Fine-Grained Control
Profiles let you configure multiple execution modes for the same model without duplicating configuration. This is useful for reasoning models where you want to balance speed versus quality.
pythonconfig = SemanticConfig( language_models={ "o4": OpenAILanguageModel( model_name="o4-mini", rpm=1000, tpm=1000000, profiles={ "fast": OpenAILanguageModel.Profile(reasoning_effort="low"), "thorough": OpenAILanguageModel.Profile(reasoning_effort="high") }, default_profile="fast" ), "claude": AnthropicLanguageModel( model_name="claude-opus-4-0", rpm=100, input_tpm=100000, output_tpm=50000, profiles={ "thinking_disabled": AnthropicLanguageModel.Profile(), "fast": AnthropicLanguageModel.Profile(thinking_token_budget=1024), "deep": AnthropicLanguageModel.Profile(thinking_token_budget=4096) }, default_profile="fast" ) }, default_language_model="o4" ) session = fc.Session.get_or_create(config)
Use profiles by passing a ModelAlias object instead of a string:
pythonimport fenic as fc from fenic.core.types.semantic import ModelAlias # Use the default "fast" profile result = df.select( fc.semantic.map( "Analyze {{ text }}", text=fc.col("content"), model_alias="o4" ) ) # Override with the "thorough" profile for specific operations detailed_result = df.select( fc.semantic.map( "Analyze {{ text }}", text=fc.col("content"), model_alias=ModelAlias(name="o4", profile="thorough") ) )
Profiles reduce configuration duplication and make it easy to experiment with different parameter combinations.
OpenRouter for Dynamic Provider Routing
OpenRouter aggregates multiple providers behind a single API. Fenic supports OpenRouter with provider-aware routing, letting you specify whether requests should route based on price, latency, or specific provider constraints.
pythonfrom fenic.api.session.config import OpenRouterLanguageModel config = SemanticConfig( language_models={ "cheap": OpenRouterLanguageModel( model_name="openai/gpt-oss-20b", profiles={ "default": OpenRouterLanguageModel.Profile( provider=OpenRouterLanguageModel.Provider( sort="price" # Route to cheapest provider ) ) }, default_profile="default" ), "fast": OpenRouterLanguageModel( model_name="anthropic/claude-sonnet-4-0-latest", profiles={ "default": OpenRouterLanguageModel.Profile( provider=OpenRouterLanguageModel.Provider( sort="latency", # Route to fastest provider only=["Anthropic"] # Ensure direct Anthropic routing ) ) }, default_profile="default" ) }, default_language_model="cheap" )
OpenRouter's provider routing lets you optimize for different dimensions without changing your code. The Fenic 0.5.0 release added full OpenRouter support with structured outputs and flexible provider configuration.
Setting Environment Variables
Fenic reads API keys from environment variables. Set the keys for all providers you plan to use:
bashexport OPENAI_API_KEY=sk-... export ANTHROPIC_API_KEY=sk-ant-... export GOOGLE_API_KEY=... export COHERE_API_KEY=... export OPENROUTER_API_KEY=sk-or-...
The Fenic 0.4.0 release introduced provider key validation at session creation. If keys are missing or invalid, Fenic fails immediately with clear errors rather than waiting until runtime.
Batch Processing with Concurrent Inference
Fenic handles batch inference automatically through its DataFrame API. When you apply semantic operations to a column, Fenic batches requests, manages rate limits, and processes rows concurrently.
python# Read documents docs = session.read.docs("data/*.md", content_type="markdown") # Extract structured data with automatic batching results = docs.select( fc.col("file_path"), fc.semantic.extract( fc.col("content"), response_format=DocumentMetadata, model_alias="gpt4" ).alias("metadata") ) # Fenic automatically: # - Batches API requests # - Respects rate limits # - Handles retries # - Manages concurrency results.show()
For custom async operations, use async UDFs with configurable concurrency:
pythonimport aiohttp from fenic.api.functions import async_udf from fenic.core.types import IntegerType @async_udf( return_type=IntegerType, max_concurrency=20, # Control parallelism timeout_seconds=5, num_retries=2 ) async def fetch_score(user_id: int) -> int: async with aiohttp.ClientSession() as session: async with session.get(f"https://api.example.com/score/{user_id}") as resp: data = await resp.json() return data["score"] df = df.select( fc.col("user_id"), fetch_score(fc.col("user_id")).alias("score") )
The async UDF feature in Fenic 0.4.0 enables controlled concurrency for I/O-bound workloads while maintaining DataFrame semantics.
Monitoring Inference Costs and Performance
Fenic tracks inference metrics locally. Query the built-in metrics table to analyze costs, latency, and token usage per model.
pythonmetrics = session.table("fenic_system.query_metrics") # View recent queries with their costs metrics.select( "session_id", "execution_id", "total_lm_cost", "total_lm_requests", "execution_time_ms", "end_ts" ).show() # Analyze total costs over time cost_analysis = session.sql(""" SELECT CAST(SUM(total_lm_cost) AS DOUBLE) AS total_cost, CAST(SUM(total_lm_requests) AS DOUBLE) AS total_requests, COUNT(*) AS query_count FROM {df} WHERE total_lm_cost > 0 """, df=metrics) cost_analysis.show()
This visibility helps you optimize provider selection based on actual performance data.
Provider-Specific Capabilities
Different providers offer unique features. Fenic exposes these through the semantic API.
PDF Processing with Gemini
Google's Gemini models support native file processing. Use semantic.parse_pdf to convert PDFs to markdown:
pythonfrom fenic.api.session.config import ( SessionConfig, SemanticConfig, GoogleDeveloperLanguageModel ) session = fc.Session.get_or_create( SessionConfig( app_name="pdf_processing", semantic=SemanticConfig( language_models={ "gemini": GoogleDeveloperLanguageModel( model_name="gemini-2.0-flash", rpm=100, tpm=1000 ) }, default_language_model="gemini" ) ) ) pdfs = session.read.pdf_metadata("data/**/*.pdf", recursive=True) markdown = pdfs.select( fc.col("file_path"), fc.semantic.parse_pdf( fc.col("file_path"), page_separator="--- PAGE {page} ---", describe_images=True ).alias("markdown") )
The PDF parsing capability in Fenic 0.5.0 uses Gemini's native file API for efficient processing.
Structured Output Strategies
Configure how Fenic requests structured outputs:
pythonOpenRouterLanguageModel( model_name="anthropic/claude-sonnet-4-0-latest", structured_output_strategy="prefer_response_format", # or "prefer_tools" profiles={...} )
This controls whether Fenic uses the provider's native response format API or tool-calling for structured outputs.
Real-World Pattern: Research Agent with Multiple Providers
The HN research agent demonstrates production patterns for multi-provider inference:
python# Configure providers for different workloads config = SemanticConfig( language_models={ "summarizer": GoogleDeveloperLanguageModel( model_name="gemini-2.0-flash", # Fast, cheap for batch summaries rpm=100, tpm=10000 ), "synthesizer": AnthropicLanguageModel( model_name="claude-opus-4-0", # Deep reasoning for synthesis rpm=50, input_tpm=50000, output_tpm=25000 ) }, default_language_model="summarizer" ) # Batch summarization with Gemini summaries = stories.select( fc.semantic.map( "Summarize: {{ text }}", text=fc.col("text"), model_alias="summarizer" ).alias("summary") ) # Final synthesis with Claude report = summaries.select( fc.semantic.map( "Synthesize these findings: {{ summary }}", summary=fc.col("summary"), model_alias="synthesizer" ).alias("report") )
This pattern offloads bulk inference to a cost-effective provider while reserving expensive models for high-value synthesis tasks.
Best Practices
Set appropriate rate limits: Configure RPM and TPM based on your API tier. Underestimating limits causes throttling; overestimating risks quota exhaustion.
Use model aliases consistently: Pick descriptive names like "fast-summarizer" or "complex-reasoner" rather than provider names. This makes it easier to swap providers later.
Monitor metrics regularly: Check the metrics table to identify cost spikes or latency issues. Use this data to adjust provider selection.
Validate API keys at startup: Fenic validates keys during session creation. Run a quick health check before production deployments to catch configuration issues early.
Leverage profiles for experimentation: Test different reasoning efforts or thinking budgets without creating separate model configurations.
Match providers to workloads: Use fast, cheap models for bulk processing. Reserve expensive models for complex reasoning or synthesis tasks.
Consider OpenRouter for flexibility: If you need dynamic routing or want to avoid vendor lock-in, configure OpenRouter with appropriate routing policies.
Error Handling and Reliability
Fenic includes several reliability features:
Token capacity guardrails: Fenic validates that requests fit within model limits before sending them, preventing mid-pipeline failures.
Automatic retries: Transient errors trigger exponential backoff. Configure retry behavior through async UDF parameters.
Fail-fast on quota exhaustion: OpenAI 429 "quota exhausted" errors fail immediately with clear messages rather than retrying indefinitely.
Batch client stability: Provider-specific fixes ensure predictable behavior across OpenAI, Anthropic, Google, and OpenRouter clients.
These features are documented in the Fenic 0.5.0 release notes.
Going Further
Multi-provider management in Fenic extends beyond configuration. The framework supports:
- Embedding models from multiple providers alongside language models
- Cloud execution for distributed processing with cloud configuration
- MCP server integration for exposing Fenic tools to agents and assistants
- Declarative tool creation for building agent tools backed by DataFrame queries
For more advanced patterns, see:
- Building agentic applications with declarative APIs
- Orchestrating reliable agents with LangGraph
- Real-time context engineering for AI agents
Summary
Fenic's multi-provider support gives you the flexibility to optimize every workload. Configure providers through SemanticConfig, enforce rate limits automatically, switch models with aliases, and track performance through built-in metrics.
Start with a simple multi-provider setup, use the right model for each task, and let Fenic handle the operational complexity. Install Fenic with pip install fenic and check the official documentation to get started.
How to Manage Multiple LLM ... fcf08032a9cae9ee6bcf2a2e.md
External
Displaying How to Manage Multiple LLM Providers in a Single I 2a6df41efcf08032a9cae9ee6bcf2a2e.md.

