TL;DR Upgrade now to unlock Rust-powered Jinja templates, built-in fuzzy matching, Pydantic-driven schemas, persistent views, new semantic functions, performance boosts, and a smoother developer experience.
bashpip install --upgrade fenic
What’s in It for You
- Dynamic Templating: Jinja as a first-class column function
- Robust Fuzzy Text Matching: 6 similarity algorithms, 3 comparison modes
- Typed Semantics: Full Pydantic support in all semantic operators
- Composable Pipelines: Persistent “views” you can save & reuse
- Fresh Functions & Models: New semantic APIs, embedding providers
- Speed & DX: Rust-backed cores, leaner defaults, better defaults
Jinja templates
Jinja has played a key role in the Python data ecosystem — it's what powers tools like dbt, and it's also widely used to improve the ergonomics of working with prompts.
It’s a templating DSL that developers and models understand well, making it an easy to choice to standardize around for prompt management.
We’ve added Jinja as a column function, that means that you can do things like the following:
pythondf.select(text.jinja( "Hello {{ name }}{% if vip %} (VIP){% endif %}!", name=col("customer_name"), vip=col("is_vip") ))
Notice above how you can pass as parameters the associations between columns and Jinja template variables.
The column function is written in Rust so it’s fast.
In addition to performance, fenic is able to walk the Jinja AST and determine how template variables are being used. The cool thing about this, is that now we can validate the template against the fenic types during planning, failing as fast as possible and improving reliability and DX tremendously.
Jinja + Semantic Operations
Having support for rendering arbitrary Jinja templates is cool, but what’s even cooler is how Fenic integrates Jinja into its semantic operators.
This allows fenic to support dynamic prompts that adapt based on column data. Previously, semantic operators could reference the columns of the dataframe as part of the prompt but these prompt were static.
Now it's possible to reference column values directly in prompts and use Jinja syntax for array iteration, array access, struct access, and boolean control flow.
Example:
pythonresult = source.select( semantic.predicate( dedent("""\\ Given the ### Required Qualifications: {% for req in job.requirements %} - {{ req }} {% endfor %} ### Candidate Resume: {{ resume.name }} {{ resume.age }} {% for exp in resume.experience %} {{ exp.company }}: {{ exp.title }} - {{ exp.description }} {% endfor %} Is the candidate a good fit for the role? """), job=col("job"), resume=col("resume"), examples=examples, ).alias("is_good_fit") )
Jinja templates are making semantic.map and semantic.predicate even more powerful.
Particularly for semantic.reduce, it's now possible to take into account the temporal ordering of data in your prompt and you can do that easily, using the order_by argument.
Why is this important for reduce?
semantic.reduce greedily packs context windows, handling all the complexities of chunking and recursively applying inference to your data.
When you specify an ordering, it preserves temporal coherence throughout the semantic.reduce operation by sorting records in each group before applying the prompt.
This is especially valuable for transcripts, conversations, or time-series data—helping the model generate more coherent narratives and significantly improving result quality, all while keeping the interface simple and intuitive.
Example with semantic.reduce:
pythonsemantic.reduce( "Summarize events for {{location}}", col("events"), group_context={"location": col("location")}, order_by=col("timestamp").asc() )
fenic ensures that templates adhere to strict rules to guarantee clean integration with fenic expressions, safe rendering, and explicit variable dependency tracking.
Fuzzy string matching capabilities
A core belief behind fenic is that AI will fulfill its promise when paired with the proven engineering practices that power today’s software systems.
LLMs are being used heavily for extracting structured information from unstructured textual data and for performing entity recognition and extraction.
To ground the results of the above and make these pipelines more robust, we need to incorporate techniques like fuzzy string matching.
Additionally, fuzzy string matching can be used for blocking - filtering out easy matches or unnecessary comparisons during preprocessing, before processing data using more expensive resolution techniques like model inference or semantic search.
In this release, we introduce the first of a series of features providing 6 similarity algorithms across 3 comparison modes, based on rapidfuzz, to serve as primitives for text matching, deduplication, and record linkage workflows.
What's new:
Three fuzzy matching functions:
- text.compute_fuzzy_ratio() - Direct string similarity comparison
- text.compute_fuzzy_token_sort_ratio() - Order-independent comparison after token sorting
- text.compute_fuzzy_token_set_ratio() - Set-based comparison ignoring duplicates and order
Six similarity algorithms:
- Indel: Pure insertion-deletion distance (no substitutions)
- Levenshtein: Classic edit distance
- Damerau-Levenshtein: Edit distance with transpositions
- Jaro: Character proximity-based similarity
- Jaro-Winkler: Jaro with prefix boost
- Hamming: Position-by-position comparison with auto-padding
All the above are implemented in Rust and exposed to the Python Dataframe API of fenic for performance.
Full support for Pydantic schemas in semantic operators
We previously supported two ways to define schemas for semantic.extract: a custom schema format and Pydantic models.
The custom format initially offered more features, but over time, we realized that Pydantic’s ecosystem, clarity, and flexibility made it the better choice.
So we brought our Pydantic support up to feature parity—adding support for nested models, lists, and optional fields—and are now deprecating the custom ExtractSchema format to simplify the API and focus on a single, robust standard.
In more detail, we now support:
- Complex nested Pydantic models
- Lists
- Optionals
Improved extraction system prompt, enhanced model-to-prompt generation while we deprecated the custom ExtractSchema that was supported previously.
One of the coolest features related to this is that semantic.map now has support for structured output by optionally providing a schema as a Pydantic model!
pythonfrom pydantic import BaseModel, Field # Define a schema for structured output class EmailResponse(BaseModel): subject: str = Field(description="A concise, professional email subject line") body: str = Field(description="A complete, empathetic email body") urgency_level: str = Field(description="Assessment of how quickly this requires follow-up") # Use the schema with semantic.map result = semantic.map( "Generate a customer service email response for: {{support_ticket}}", schema=EmailResponse )
This streamlines the use of semantic.map, eliminating much of the boilerplate and output parsing required to extract structure from model outputs.
Persistent views → Pipeline Reuse Made Easy
Save any DataFrame as a view in the Fenic catalog:
pythondf.write.save_as_view("view_name") # stores the dataframe as a view in fenic session.catalog.does_view_exist("view_name") # checks if a view exists # drop a view from the catalog session.catalog.drop_view("typedef_default.df3", ignore_if_not_exists=False) # load a stored view and continue building on the dataframe df = session.view("view_name").select(...).filter(...)
- Compose complex workflows
- Query or transform views just like tables
New Functions & Models
We've added a variety of new functions to the release. Here's a few of them:
- greatest/least column functions
- semantic summarization function
- Cohere embeddings
- Google Gemini embeddings
Both Cohere and Gemini embeddings come with configurable profiles. That means that you can choose your embedding dimensionality via matryoshka embeddings to tradeoff processing speed vs correctness.
For the full list, see the release notes
Performance and DX improvements
The initial release resulted in a great amount of feedback related to the performance and DX of fenic.
Here are some of the improvements that have made it to this release.
-
Switched to scikit-learn KMeans
Control clustering quality ↔ performance; more stable, reproducible results.
-
Model profile configs
- OpenAILanguageModel.Profile: reasoning_effort (low/medium/high)
- AnthropicLanguageModel.Profile: thinking_token_budget
- GoogleDeveloperLanguageModel.Profile & GoogleVertexLanguageModel.Profile: thinking_token_budget
-
Enhanced semantic.classify
- Optional class descriptions for richer context
- Backward-compatible with string-list input; dropped enum support
-
Lean session setup
- SemanticConfig now optional in SessionConfig
- language_models optional within SemanticConfig → Use Fenic for pure OLAP workloads without semantic overhead
-
Transcript support
- Added basic WebVTT format compatibility
Bug fixes & Documentation
This release wouldn't be complete with a healthy amount of bugs being fixed and iterating on our documentation. More on that can be found on the release notes here
Try It Out! Share Your Feedback!
- Upgrade: pip install --upgrade fenic
- Explore: Check out the 0.3.0 docs
- Engage: Star the repo, file issues, or drop into Discord
Thank you for being part of the Fenic community, your feedback drives every release!
Until next time,
The Fenic Team