Introducing Fenic: A cute, PySpark-Inspired DataFrame Framework for AI Workflows

Building data products used to be "straightforward". Extract data, load into your warehouse, run transformation DAGs, expose through BI dashboards. The patterns were well-understood, tooling was mature, and scaling compute was the biggest challenge.

But this playbook falls apart with unstructured data.

Fenic is an opinionated, PySpark-inspired DataFrame framework from typedef.ai for building AI and agentic applications.

It transforms unstructured and structured data into insights using familiar DataFrame operations enhanced with semantic intelligence, with first-class support for markdown, transcripts, and semantic operators, plus efficient batch inference across any model provider.

The Unstructured Data Problem

Your most valuable data lives in PDFs, audio recordings, images, and video files. That clean ETL pipeline becomes a preprocessing maze requiring OCR models, transcription models, and computer vision—each introducing new failure modes, latency, and costs.

Once you extract text, the real complexity begins: turning unstructured text into something analysts and applications can actually use. The answer is often complex LLM pipelines that quickly become operational nightmares.

You're managing rate limits across providers, chunking documents for context windows, balancing expensive accurate models against cheaper less reliable ones, and constantly moving data between custom LLM scripts, warehouses, and inference infrastructure. The impedance mismatch creates overhead, duplication, and chaos.

How Fenic Works

Our insight: agentic workflows and AI applications are just pipelines. They take inputs, reason over context, generate outputs, and log results. That's not so different from traditional data workflows.

Fenic tames this chaos with a familiar abstraction: the DataFrame.

DataFrames bring structure and determinism to probabilistic systems.

Even with stochastic inference (LLMs, OCR, transcription) DataFrames provide:

Lineage: Every column and row has traceable origins, even from model output
Columnar consistency: Whether summary, embedding, or toxicity_score, columns stay structured and meaningful
Deterministic transformations: Inference calls wrapped in declarative logic—model + prompt + input → output—for caching, versioning, and debugging

Fenic provides a declarative DataFrame API for automatic optimization and auditability, but it's just Python, so you can write any imperative code you need for dynamic behavior, custom logic, or tight integration with your existing stack.

Inference is First-Class

Fenic handles multiple model providers, rate limits, and failures. It self-throttles for reliable, production-grade pipelines while maximizing throughput with async I/O and concurrent request batching.

You declaratively define inference steps while Fenic handles orchestration.

It supports AI-native data types (Markdown, JSON, transcripts, embeddings) with first-class column functions for direct pipeline manipulation.

The Preprocessing Layer for Agents

Let Fenic handle the heavy lifting: extracting structure, enriching context, preparing clean data, then hand off to agents or downstream systems.

By decoupling batch inference from real-time reasoning, Fenic makes AI systems easier to debug, test, and scale without sacrificing responsiveness. Your agents focus on decisions, not data prep.

The Case for Open Sourcing Fenic

Fenic emerged from building Typedef, a platform unifying inference, search, and analytics. We had to rethink query engines from first principles, fusing OLAP, search, and inference into one system.

Why open source something so core to our business?

Local-first development: Developers deserve freedom to choose when to run workloads locally or remotely. Fenic makes this possible. It's a full engine, not just a thin client.

Privacy and control: LLMs and agentic workflows make locality critical. Sometimes data shouldn't leave your laptop, or edge inference unlocks new patterns.

Exploring the frontier: We're still figuring out production-grade AI-native systems. Open source is the best way to explore this frontier together.

Giving back: We stand on incredible shoulders: Apache Arrow, DataFusion, DuckDB, Polars. We were also inspired by LOTUS's work on semantic operators. Open sourcing Fenic expands this ecosystem.

What's Next

Our hope is that Fenic becomes a platform for the community to shape AI-native data systems.

If you're building AI applications, agentic workflows, or experimenting with inference pipelines:

Explore the GitHub repo
Join our Discord community
Check out the roadmap
Give us a ⭐ if the vision resonates

We can't wait to see what you build.

From the creators of typedef — infrastructure for the AI-native era.