<< goback()

Fenic 0.3.0 Released: Rust-powered Jinja, Fuzzy Matching & More

Kostas Pardalis

Kostas Pardalis

Co-Founder
Fenic 0.3.0 Released: Rust-powered Jinja, Fuzzy Matching & More

TL;DR   Upgrade now to unlock Rust-powered Jinja templates, built-in fuzzy matching, Pydantic-driven schemas, persistent views, new semantic functions, performance boosts, and a smoother developer experience.

Upgrade Today

bash
pip install --upgrade fenic

What’s in It for You

  • Dynamic Templating: Jinja as a first-class column function
  • Robust Fuzzy Text Matching: 6 similarity algorithms, 3 comparison modes
  • Typed Semantics: Full Pydantic support in all semantic operators
  • Composable Pipelines: Persistent “views” you can save & reuse
  • Fresh Functions & Models: New semantic APIs, embedding providers
  • Speed & DX: Rust-backed cores, leaner defaults, better defaults

Jinja templates

Jinja has played a key role in the Python data ecosystem — it's what powers tools like dbt, and it's also widely used to improve the ergonomics of working with prompts.

It’s a templating DSL that developers and models understand well, making it an easy to choice to standardize around for prompt management.

We’ve added Jinja as a column function, that means that you can do things like the following:

python
df.select(text.jinja( "Hello {{ name }}{% if vip %} (VIP){% endif %}!", name=col("customer_name"), vip=col("is_vip") ))

Notice above how you can pass as parameters the associations between columns and Jinja template variables.

The column function is written in Rust so it’s fast.

In addition to performance, fenic is able to walk the Jinja AST and determine how template variables are being used. The cool thing about this, is that now we can validate the template against the fenic types during planning, failing as fast as possible and improving reliability and DX tremendously.

Jinja + Semantic Operations

Having support for rendering arbitrary Jinja templates is cool, but what’s even cooler is how Fenic integrates Jinja into its semantic operators.

This allows fenic to support dynamic prompts that adapt based on column data. Previously, semantic operators could reference the columns of the dataframe as part of the prompt but these prompt were static.

Now it's possible to reference column values directly in prompts and use Jinja syntax for array iteration, array access, struct access, and boolean control flow.

Example:

python
result = source.select( semantic.predicate( dedent("""\\ Given the ### Required Qualifications: {% for req in job.requirements %} - {{ req }} {% endfor %} ### Candidate Resume: {{ resume.name }} {{ resume.age }} {% for exp in resume.experience %} {{ exp.company }}: {{ exp.title }} - {{ exp.description }} {% endfor %} Is the candidate a good fit for the role? """), job=col("job"), resume=col("resume"), examples=examples, ).alias("is_good_fit") )

Jinja templates are making semantic.map and semantic.predicate even more powerful.

Particularly for semantic.reduce, it's now possible to take into account the temporal ordering of data in your prompt and you can do that easily, using the order_by argument.

Why is this important for reduce?

semantic.reduce greedily packs context windows, handling all the complexities of chunking and recursively applying inference to your data.

When you specify an ordering, it preserves temporal coherence throughout the semantic.reduce operation by sorting records in each group before applying the prompt.

This is especially valuable for transcripts, conversations, or time-series data—helping the model generate more coherent narratives and significantly improving result quality, all while keeping the interface simple and intuitive.

Example with semantic.reduce:

python
semantic.reduce( "Summarize events for {{location}}", col("events"), group_context={"location": col("location")}, order_by=col("timestamp").asc() )

fenic ensures that templates adhere to strict rules to guarantee clean integration with fenic expressions, safe rendering, and explicit variable dependency tracking.

Fuzzy string matching capabilities

A core belief behind fenic is that AI will fulfill its promise when paired with the proven engineering practices that power today’s software systems.

LLMs are being used heavily for extracting structured information from unstructured textual data and for performing entity recognition and extraction.

To ground the results of the above and make these pipelines more robust, we need to incorporate techniques like fuzzy string matching.

Additionally, fuzzy string matching can be used for blocking - filtering out easy matches or unnecessary comparisons during preprocessing, before processing data using more expensive resolution techniques like model inference or semantic search.

In this release, we introduce the first of a series of features providing 6 similarity algorithms across 3 comparison modes, based on rapidfuzz, to serve as primitives for text matching, deduplication, and record linkage workflows.

What's new:

Three fuzzy matching functions:

  • text.compute_fuzzy_ratio() - Direct string similarity comparison
  • text.compute_fuzzy_token_sort_ratio() - Order-independent comparison after token sorting
  • text.compute_fuzzy_token_set_ratio() - Set-based comparison ignoring duplicates and order

Six similarity algorithms:

  • Indel: Pure insertion-deletion distance (no substitutions)
  • Levenshtein: Classic edit distance
  • Damerau-Levenshtein: Edit distance with transpositions
  • Jaro: Character proximity-based similarity
  • Jaro-Winkler: Jaro with prefix boost
  • Hamming: Position-by-position comparison with auto-padding

All the above are implemented in Rust and exposed to the Python Dataframe API of fenic for performance.

Full support for Pydantic schemas in semantic operators

We previously supported two ways to define schemas for semantic.extract: a custom schema format and Pydantic models.

The custom format initially offered more features, but over time, we realized that Pydantic’s ecosystem, clarity, and flexibility made it the better choice.

So we brought our Pydantic support up to feature parity—adding support for nested models, lists, and optional fields—and are now deprecating the custom ExtractSchema format to simplify the API and focus on a single, robust standard.

In more detail, we now support:

  1. Complex nested Pydantic models
  2. Lists
  3. Optionals

Improved extraction system prompt, enhanced model-to-prompt generation while we deprecated the custom ExtractSchema that was supported previously.

One of the coolest features related to this is that semantic.map now has support for structured output by optionally providing a schema as a Pydantic model!

python
from pydantic import BaseModel, Field # Define a schema for structured output class EmailResponse(BaseModel): subject: str = Field(description="A concise, professional email subject line") body: str = Field(description="A complete, empathetic email body") urgency_level: str = Field(description="Assessment of how quickly this requires follow-up") # Use the schema with semantic.map result = semantic.map( "Generate a customer service email response for: {{support_ticket}}", schema=EmailResponse )

This streamlines the use of semantic.map, eliminating much of the boilerplate and output parsing required to extract structure from model outputs.

Persistent views → Pipeline Reuse Made Easy

Save any DataFrame as a view in the Fenic catalog:

python
df.write.save_as_view("view_name") # stores the dataframe as a view in fenic session.catalog.does_view_exist("view_name") # checks if a view exists # drop a view from the catalog session.catalog.drop_view("typedef_default.df3", ignore_if_not_exists=False) # load a stored view and continue building on the dataframe df = session.view("view_name").select(...).filter(...)
  • Compose complex workflows
  • Query or transform views just like tables

New Functions & Models

We've added a variety of new functions to the release. Here's a few of them:

  • greatest/least column functions
  • semantic summarization function
  • Cohere embeddings
  • Google Gemini embeddings

Both Cohere and Gemini embeddings come with configurable profiles. That means that you can choose your embedding dimensionality via matryoshka embeddings to tradeoff processing speed vs correctness.

For the full list, see the release notes

Performance and DX improvements

The initial release resulted in a great amount of feedback related to the performance and DX of fenic.

Here are some of the improvements that have made it to this release.

  • Switched to scikit-learn KMeans

    Control clustering quality ↔ performance; more stable, reproducible results.

  • Model profile configs

    • OpenAILanguageModel.Profile: reasoning_effort (low/medium/high)
    • AnthropicLanguageModel.Profile: thinking_token_budget
    • GoogleDeveloperLanguageModel.Profile & GoogleVertexLanguageModel.Profile: thinking_token_budget
  • Enhanced semantic.classify

    • Optional class descriptions for richer context
    • Backward-compatible with string-list input; dropped enum support
  • Lean session setup

    • SemanticConfig now optional in SessionConfig
    • language_models optional within SemanticConfig → Use Fenic for pure OLAP workloads without semantic overhead
  • Transcript support

    • Added basic WebVTT format compatibility

Bug fixes & Documentation

This release wouldn't be complete with a healthy amount of bugs being fixed and iterating on our documentation. More on that can be found on the release notes here

Try It Out! Share Your Feedback!

  1. Upgrade: pip install --upgrade fenic
  2. Explore: Check out the 0.3.0 docs
  3. Engage: Star the repo, file issues, or drop into Discord

Thank you for being part of the Fenic community, your feedback drives every release!

Until next time,

The Fenic Team

Share this post
the next generation of

data processingdata processingdata processing

Join us in igniting a new paradigm in data infrastructure. Enter your email to get early access and redefine how you build and scale data workflows with typedef.