Fenic 0.4.0 Released: Declarative Tools, MCP, and HuggingFace — plus major DX & reliability gains
TL;DR Upgrade now to unlock declarative tool creation for function calling, a production‑ready MCP server, GPT‑5 & Claude Opus 4.1 support, a new HuggingFace connector, directory loaders, local metrics tables, richer catalog metadata, clearer errors, and big performance & stability improvements.
bashpip install --upgrade fenic
What’s in it for you
- Declarative tools for agents: Define function‑calling tools as data — less boilerplate, safer types, faster iteration.
- Assistant integrations out‑of‑the‑box: Full MCP server so Claude Code, Gemini CLI, Cursor & friends can use your Fenic tools directly.
- Latest models, simpler ops: GPT‑5 & Claude Opus 4.1 support, plus provider key validation to fail fast.
- More data, fewer hops: Read HuggingFace datasets via
hf://…
and load entire directories into DataFrames. - See cost & performance: Built‑in local metrics table to analyze latency and spend per pipeline.
- Catalog you can trust: Descriptions on views/tables, thread‑safe local catalog, logical types in cloud catalog.
- Cleaner DX: Crisper errors (e.g.,
union()
schema mismatches), handynull()
/empty()
helpers, clearer S3 auth behavior. - Faster & sturdier: Thread‑safe concurrency, Rust regex validation, smarter retries and resource cleanup.
- Async UDFs for concurrent I/O: Run API/DB/MCP calls in parallel with ordering, retries, and timeouts—without leaving DataFrame semantics.
Declarative tool creation (catalog-backed) (⭐️)
Build LLM tools by declaring what the tool does and its parameters, then register it in the Fenic catalog. Catalog tools are type-safe, discoverable, and automatically consumable by MCP servers and fenic-serve.
Why it matters
- Drop up to 70% of agent boilerplate.
- Strongly-typed params via ToolParam with automatic validation.
- Tools are versionable metadata—easy to diff, review, and reuse.
- One definition, many runtimes (programmatic servers, ASGI, CLI).
pythonfrom fenic.api.session import Session from fenic.core.mcp.types import ToolParam session = Session.get_or_create() # Your DataFrame query that implements the tool df = session.create_dataframe(...) # e.g., search, transform, summarize # Register the tool in the catalog session.catalog.create_tool( tool_name="my_tool", tool_description="A tool that searches documents", tool_query=df, # The DataFrame query to execute tool_params=[ ToolParam( name="search_term", description="The term to search for", allowed_values=None, # Or e.g. ["bug","feature","note"] default_value="default", # Optional ), ToolParam( name="limit", description="Max results to return", default_value=10, ), ], result_limit=50, # Max rows to return )
Pair this with Fenic’s semantic operators and you can roll out production‑grade agent tools in minutes.
MCP servers: run Fenic tools anywhere
Fenic ships a complete Model Context Protocol (MCP) server with multiple ways to run it. Pick the style that fits your deployment and integrate assistants without leaving your data plane.
Programmatic — synchronous
pythonfrom fenic.api.mcp import create_mcp_server, run_mcp_server_sync from fenic.api.session import Session from fenic.core.mcp.types import ParameterizedToolDefinition session = Session.get_or_create() tools = session.catalog.list_tools() server = create_mcp_server( session, "MyServer", tools=tools, # List of ParameterizedToolDefinition concurrency_limit=8, ) run_mcp_server_sync( server, transport="http", # or "stdio" stateless_http=True, port=8000, host="127.0.0.1", path="/mcp", )
Programmatic — asynchronous
pythonimport asyncio from fenic.api.mcp import create_mcp_server, run_mcp_server_async from fenic.api.session import Session session = Session.get_or_create() async def main(): tools = session.catalog.list_tools() server = create_mcp_server(session, "MyServer", tools=tools) await run_mcp_server_async( server, transport="http", stateless_http=True, port=8000, host="127.0.0.1", ) asyncio.run(main())
ASGI application (production-ready)
pythonfrom fenic.api.mcp import create_mcp_server, run_mcp_server_asgi from fenic.api.session import Session from fenic.core.mcp.types import ParameterizedToolDefinition session = Session.get_or_create() tools = session.catalog.list_tools() server = create_mcp_server(session, "MyServer", tools=tools) app = run_mcp_server_asgi( server, stateless_http=True, port=8000, host="127.0.0.1", path="/mcp", ) # Launch with any ASGI server, e.g.: # uvicorn myapp:app
CLI - fenic-serve
bash# Run with all catalog tools fenic-serve # Run with specific catalog tools fenic-serve --tools sales_by_product sales_by_customer # HTTP transport (default) fenic-serve --transport http --port 8000 --host 127.0.0.1 # stdio transport (for direct tool integration) fenic-serve --transport stdio # Custom config + selected tools fenic-serve --config-file ./session.config.json --tools my_tool # Stateful HTTP sessions fenic-serve --stateful-http
Direct server methods
pythonserver = create_mcp_server(session, "MyServer", tools=[...]) # Synchronous server.run(transport="http", stateless_http=True) # Asynchronous await server.run_async(transport="stdio") # Get ASGI app directly app = server.http_app(stateless_http=True, port=8000)
Transport options
- http: default; ideal for web services and APIs
- stdio: direct tool integration (e.g., Claude Desktop)
Key parameters
- stateless_http: maintain state across requests (False) or not (True)
- concurrency_limit: max concurrent tool executions (default 8)
- transport: "http" or "stdio"
- port / host / path: HTTP server configuration
Async UDFs: concurrent I/O inside your DataFrames (⭐️)
Run network and tool-bound work in parallel without leaving DataFrame semantics. Async UDFs let you fan out API calls and database lookups across rows concurrently, while preserving type safety, input order, and predictable resource usage.
Why it matters
- Throughput for I/O workloads: Maximize parallelism on slow endpoints with bounded concurrency, retries, and timeouts.
- Production-safe by design: Ordered results, cooperative cancellation, and memory-aware buffering prevent tail-latency blowups.
- Fenic-native ergonomics: Keep transformations declarative—no bespoke asyncio plumbing required.
Key features
- Configurable concurrency — max_concurrency caps in-flight tasks
- Automatic retries — exponential backoff for transient failures
- Timeouts — per-item timeout_seconds to avoid hangs
- Ordered results — output matches input row order
- Resource management — bounded buffers; cancels pending work on error
- Type safety — declared return_type with clear, actionable errors
Usage example
pythonimport fenic as fc from fenic.core.types import IntegerType import aiohttp @fc.async_udf( return_type=IntegerType, max_concurrency=10, timeout_seconds=5, num_retries=2, ) async def fetch_score(user_id: int) -> int: async with aiohttp.ClientSession() as session: async with session.get(f"https://api.example.com/score/{user_id}") as resp: data = await resp.json() return data["score"] # Apply to a DataFrame df = df.select( fc.col("user_id"), fetch_score(fc.col("user_id")).alias("score"), )
Great for
- Parallel API calls inside DataFrame transforms
- Low-latency lookups against services/DBs
Under the hood, fenic uses a unified event loop, smart buffer management, and cooperative cancellation. Individual failures return None (instead of failing the entire batch), keeping pipelines resilient.
Enhanced AI model support
GPT‑5 integration
- Specialized parameters including verbosity control and minimal reasoning modes for cost‑efficient runs.
- Smoother high‑throughput batch operations.
Claude Opus 4.1
- Access Anthropic’s latest capabilities with easy provider switching.
Provider key validation
- Validate API keys at session init with clear, fail‑fast errors.
- Eliminate surprise runtime failures from missing/misconfigured credentials.
pythonfrom fenic.api.session import Session # Provider keys ARE validated during session creation # If any configured model has invalid keys, this will fail immediately session = Session.get_or_create() # ← Validation happens HERE # By the time you get here, all keys have already been validated df.semantic.map("content", prompt="...", model="gpt-5")
Data processing enhancements
HuggingFace connector
Read datasets directly from the ML ecosystem using a simple URI scheme.
python# HuggingFace connector - use hf:// scheme df = session.read.csv("hf://datasets/squad/default/train.csv") # or for parquet files: df = session.read.parquet("hf://datasets/cais/mmlu/astronomy/*.parquet") df = df.semantic.extract("context", schema=QuestionAnswer, model="gpt-5")
Directory content loading
Turn folders into DataFrames for batch processing — recursively, with file metadata extracted automatically.
pythonfrom fenic.core.types import MarkdownType df = session.read.docs( "/data/logs/", data_type=MarkdownType, recursive=True )
Local metrics table
Track inference metrics (latency, cost, model, tokens) locally for inspection and optimization.
pythonmetrics = session.table("fenic_system.query_metrics") metrics.select("model", "latency_ms", "cost_usd").order_by("latency_ms").show()
Catalog improvements
- Add descriptions to views/tables for self‑documenting pipelines.
- Thread‑safe local catalog for safer concurrent work.
- Logical types in the cloud catalog and richer metadata.
python# Catalog improvements session.catalog.create_table( 'my_table', schema, description='My table description' )
Developer experience (DX)
Better errors (e.g., union()
)
More actionable messages on schema mismatches with suggested fixes and clearer stack traces.
Utility functions
- F.null(data_type) - creates null values
- F.empty(data_type) - creates empty values (empty arrays/structs, or null for primitives)
python# Utility functions from fenic.api.functions import null, empty from fenic.core.types import IntegerType, ArrayType # Create null-valued column df = df.with_column("null_col", F.null(IntegerType)) # Create empty array column df = df.with_column("empty_array", F.empty(ArrayType(IntegerType)))
S3 integration
Friendlier messages when credentials are missing, automatic env detection, and support for multiple auth methods.
Performance & reliability
- Thread‑safe operations.
- Rust regex validation for
rlike
/ilike
/like
. - Async UDF flow reduces tool‑call latency.
- OpenAI retry logic handles intermittent 404s during batch processing.
- Clean shutdowns prevent memory leaks (proper event‑loop and task cancellation).
Upgrading from v0.3.x
No breaking changes. It's safe to perform the following:
bashpip install --upgrade fenic
Documentation & examples
- Enriched MCP server example README
- Fixed group‑by docstrings & examples
- Updated example notebooks; Colab‑friendly main README
- Refreshed clustering API docs
Try it out & tell us what you build
- Upgrade:
pip install --upgrade fenic
- Explore: read the latest docs at docs.fenic.ai
- Engage: ⭐ the repo, open issues, or hop into Discord
Links
- GitHub: https://github.com/typedef-ai/fenic
- Release Notes: https://github.com/typedef-ai/fenic/releases/tag/v0.4.0
- Docs: https://docs.fenic.ai
- Example Notebooks: https://github.com/typedef-ai/fenic/tree/main/examples
Thank you for being part of the Fenic community — your feedback drives every release.
— The Fenic Team