Shipping a system as a service is straightforward: you hide operational pain and charge for it.

Shipping a programming model (or any API surface big enough to feel like one) is different, the learning curve itself becomes the incumbent’s moat and the newcomer’s brick wall.

Compatibility tricks that work for wire protocols (Kafka- or Postgres-compatible)¹ don’t transfer to rich APIs and idioms, where the following applies:

Learning a new language without clear momentum is a hard sell.²

fenic is an inference-first DataFrame framework with LLM-native operators.

With fenic, we needed adoption without asking people to learn a brand-new dialect. The answer was LLM-native documentation via MCP: expose the user-facing API as code, tests, and comments, and give the model three simple tools, structure scan (a glob-like project map for agentic IDEs), regex search, and contextual read so it can explore code the way models actually do.

No fine-tuning, no embeddings³. Just the API surface as code, shaped and delivered for model-driven browsing.

Why compatibility playbooks break for rich APIs

“Compatible with Kafka” works because the surface area is thin: a handful of endpoints and a wire protocol. “Compatible with Postgres” works for transactional CRUD. But dataframe APIs and query runtimes sprawl just like programming languages. Operators, idioms, error semantics, configuration and a long tail of behaviors that teams learn over years. That learned behavior and the time it takes to relearn is the moat.

The usual paths and where they stall

Build on the incumbent. Keep the familiar API and swap the engine (plugins, engine replacements). This avoids retraining users but inherits the incumbent’s semantics and constraints. You’re also forever chasing upstream decisions while trying to differentiate meaningfully.
Transpile. Accept an incumbent API and translate to your runtime. In practice, semantic mismatches, edge cases, and trust issues pile up.
Plan-level interop (e.g., Spark Connect). Consume logical plans instead of raw code. Easier to start with than transpiling, but you still must optimize, lower, and execute—plus you’re beholden to the source API’s evolution. Good for migration; weak as your long-term programming model.

All three get you closer to adoption, but none solve the core problem: people won’t invest in learning a new API surface unless the wins arrive fast and the experience feels coherent.

Our approach: MCP-Docs (code-native docs for LLMs)

Our observation is that standard documentation isn't optimal for models. Instead, we expose the user-facing API as code, tests, and comments and deliver it in a model-native shape that favors small, addressable units over narrative pages.

What we expose

User-facing API only by default (to avoid confusing the model with internals). We can flip a switch to expose the full internals for contributors.
Code + comments + tests as the canonical “documentation.” Tests double as idiomatic examples.
A simple index that organizes code into units that make sense (functions, classes, modules, docstrings), not just raw text.

Three tools (optimized for model browsing)

Structure scan: a glob-like project map for agentic IDEs: quickly list folders/files and public API modules so the model can discover scope before it drills down.
Regex search: precise retrieval over identifiers and phrases; lets the model find candidates deterministically.
Contextual read: return the entire pre-segmented code unit (e.g., a function/class/method with its docstring/comments), so the model gets right-sized, self-contained context by design.

How it’s delivered

Packaged behind an MCP server, so any agentic IDE or chat (Claude Code, Cursor, ChatGPT Desktop, etc.) can attach the tools on demand.
No fine-tuning. No vector DB. No elaborate graph. Just the same mechanics models already use to traverse repos.

The result: when you ask, “write a fenic pipeline that joins events by session and summarizes text fields,” the model discovers the exact API entry points, studies real tests, and reads the right code units—without us flooding its context with the entire repo or prose it doesn’t need.

What changed for us (and our users)

Before this, trying to code fenic with an assistant was a loop of reloading context: paste docs, correct PySpark-isms (with_columns vs chaining with_column), fight drift into pandas, repeat after every context compaction. Now:

Faster first success. The model grounds on the actual API surface by fetching the exact code units it needs via structure scan, regex search, and contextual read so new users get to a working pipeline in a single session.
Less back-and-forth. Instead of “link me the doc,” the model runs a search over the code index and gets answers with the function and comment that matter.
In-tool learning. You can ask, “show me three ways to do X in fenic and explain trade-offs,” and the assistant assembles examples inspired from the test suite and API modules.
Two audiences, one index. Keep the default to “public API only” for app developers; flip to “full internals” for contributors. Same pipeline; different exposure.

This isn’t magic. It’s giving models the environment they’re optimized for, and keeping humans in the loop where it counts.

Rough edges

Idiomatic vs merely correct. Models can produce working code that isn’t stylistically clean for fenic. Tests and comments help nudge toward better patterns, but style isn’t guaranteed.
Escape hatches. Because fenic can interoperate with pandas/polars, some generations drift out of fenic. If the goal is to exercise the fenic API, you may need to steer the assistant back.
Context compaction. Long chats still compact history. You sometimes have to re-issue a quick structure scan/search to re-ground.
Coverage and comments. Quality tracks the quality of code comments and tests.

Try it

If you’re building your own API or “framework-sized” surface, try this. Index the public surface, keep tests clean and instructive, give the model the three tools, and let the agent loop do what it’s good at: read code, reason over adjacent context, and try again.

That’s how we’re getting past the moat and why I'm optimistic about the value of LLMs. They have the potential to compress the cost of of learning new programming models. The current moats built around that cost will start to dissolve and that's a good thing. It might feel threatening to businesses that rely on that inertia, but removing it lets better tools win on merit and lets teams move faster.

Notes

Footnotes

See for example Redpanda, Jepsen describes the relationship of Redpanda and Kafka very well. ↩
Steve Klabnik said that while discussing about the hard parts of building adoption for the Rust language. You can check the conversation here ↩
fine tuning is to expensive and hard to do, plus what are you going to do? ship a mini LLM just for the documentation? Embeddings on the other hand are too much of a black box. ↩