<< goback()

fenic 0.5.0: Smarter Docs, Date Data Types, OpenRouter plus planning and reliability upgrades

Kostas Pardalis

Kostas Pardalis

Co-Founder
fenic 0.5.0: Smarter Docs, Date Data Types, OpenRouter plus planning and reliability upgrades

fenic 0.5.0 focuses on turning unstructured docs into structured data faster and safer, while sharpening type handling, planning, and provider coverage.

What’s in it for you

  • Parse PDFs to clean markdown at scale with page chunking and Google model support.
  • Instant PDF metadata ingestion for discovery, filtering, and routing.
  • First‑class Date/Timestamp types with timezone transforms.
  • OpenRouter provider with provider routing and structured outputs; Sonnet 4.5 added.
  • Better guardrails: quota handling, token capacity checks, and clearer errors.
  • “fenic in 120 Seconds” quickstart notebooks to accelerate onboarding.

Document workflows

Parse PDFs into markdown with page chunking and Google support

  • semantic.parse_pdf now supports page separators and image descriptions.
  • Internally optimized token accounting reduces cost and improves batching.
  • Compatible with Google Gemini via the native file API; other providers work via standard text prompting.

Usage

python
import fenic as fc

# Configure Gemini as the default LM
session = fc.Session.get_or_create(
    fc.SessionConfig(
        app_name="blog_demo",
        semantic=fc.SemanticConfig(
            language_models={
                "gemini": fc.GoogleDeveloperLanguageModel(
                    model_name="gemini-2.0-flash",
                    rpm=100,
                    tpm=1000,
                )
            },
            default_language_model="gemini",
        ),
    )
)

# Discover PDFs, then parse into markdown
pdfs = session.read.pdf_metadata("data/docs/**/*.pdf", recursive=True)
markdown = pdfs.select(
    fc.col("file_path"),
    fc.semantic.parse_pdf(
        fc.col("file_path"),
        page_separator="--- PAGE {page} ---",
        describe_images=True,
    ).alias("markdown"),
)

Great for

  • Large PDF corpora where per‑page chunking improves throughput and prompt fit.
  • Document pipelines that mix PDF metadata filters with downstream parsing.

Read PDF metadata as a DataFrame

  • Ingest size, page_count, author, creation/mod dates, encryption, signatures, image_count, and more.
  • Perfect for discovery, quality checks, and targeted parsing.

Usage

python
df = session.read.pdf_metadata("data/reports/**/*.pdf", recursive=True)
df.show()

Time‑aware types and transforms

You can now treat dates and timestamps as first‑class citizens, including timezone‑aware conversions.

  • New DateType and TimestampType across the stack.
  • Convert to/from UTC wall‑clock semantics with Spark‑style helpers.

Usage

python
import fenic as fc

# Parse timestamps, then convert to/from specific timezones
logs = session.read.docs("data/logs/*.json", content_type="json")

ts = logs.select(
    fc.dt.to_timestamp(fc.col("when"), "yyyy-MM-dd HH:mm:ss").alias("ts_utc")
)

la_local_then_utc = ts.select(
    fc.dt.to_utc_timestamp(fc.col("ts_utc"), "America/Los_Angeles").alias("la_to_utc")
)

utc_to_la_rendered_as_utc = ts.select(
    fc.dt.from_utc_timestamp(fc.col("ts_utc"), "America/Los_Angeles").alias(
        "utc_viewed_as_la_but_in_utc"
    )
)

More models and providers

  • OpenRouter provider support with provider routing and structured outputs strategies.
  • Sonnet 4.5 added to the model catalog.
  • Gemini native token counter for accurate cost/limit accounting.

OpenRouter quickstart

python
import fenic as fc

# Requires OPENROUTER_API_KEY
config = fc.SessionConfig(
    app_name="or_demo",
    semantic=fc.SemanticConfig(
        language_models={
            "or": fc.OpenRouterLanguageModel(
                model_name="openai/gpt-4o",
                profiles={
                    "default": fc.OpenRouterLanguageModel.Profile(
                        provider=fc.OpenRouterLanguageModel.Provider(sort="latency")
                    )
                },
                default_profile="default",
            )
        },
        default_language_model="or",
    ),
)
session = fc.Session.get_or_create(config)

When to use OpenRouter

  • Price/latency/throughput‑aware routing across multiple providers.
  • Structured outputs via response_format or forced tool‑calling when needed.

Reliability and performance

  • Do not retry OpenAI 429 “quota exhausted” responses; fail fast with clear errors.
  • Token‑capacity guardrails: raise early if a request cannot be satisfied by model limits.
  • Batch client stability fixes for OpenRouter; more predictable behavior.
  • Public S3/HuggingFace access works without credentials for public datasets.
  • parse_pdf returns a MarkdownType column for clearer semantics.
  • Better defaults for Gemini 2.5 Pro when no reasoning profiles are set.
  • Test hardening around non‑deterministic paths.

Upgrading from v0.4.x

  • Reader API: Use session.read.docs(paths, content_type="markdown" | "json") rather than nested .json()/.markdown() helpers.
  • OpenRouter: set OPENROUTER_API_KEY and configure with OpenRouterLanguageModel (provider routing is optional).
  • Removed o1-mini due to lack of JSON Schema structured outputs.
  • http_app no longer accepts a port parameter; use the ASGI app runner or CLI entrypoints to set server ports.
  • Gemini: native token counting is now used under the hood for accuracy.

Documentation and examples

  • “fenic in 120 Seconds” — a fast, practical notebook series that walks through core operators, joins, extraction, and more.
  • README/docs updated, plus simpler MCP example flow.
  • OpenRouter guidance with structured outputs, provider routing, and troubleshooting tips.

Try it out and tell us what you build

Install

bash
pip install fenic

Set a provider key

bash
# Choose at least one
export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...
export GOOGLE_API_KEY=...
export COHERE_API_KEY=...
export OPENROUTER_API_KEY=...

Hello, PDFs

python
import fenic as fc

session = fc.Session.get_or_create(
    fc.SessionConfig(
        app_name="hello_pdfs",
        semantic=fc.SemanticConfig(
            language_models={
                "gemini": fc.GoogleDeveloperLanguageModel(
                    model_name="gemini-2.0-flash", rpm=100, tpm=1000
                )
            },
            default_language_model="gemini",
        ),
    )
)

pdfs = session.read.pdf_metadata("data/**/*.pdf", recursive=True)
md = pdfs.select(
    fc.col("file_path"),
    fc.semantic.parse_pdf(
        fc.col("file_path"),
        page_separator="--- PAGE {page} ---",
    ).alias("markdown"),
)
md.show()

Questions or ideas? We’d love your feedback and if you hit an edge case, file an issue with a small repro and we’ll jump on it.

Share this post
the next generation of

data processingdata processingdata processing

Join us in igniting a new paradigm in data infrastructure. Enter your email to get early access and redefine how you build and scale data workflows with typedef.