fenic 0.5.0 focuses on turning unstructured docs into structured data faster and safer, while sharpening type handling, planning, and provider coverage.
What’s in it for you
- Parse PDFs to clean markdown at scale with page chunking and Google model support.
- Instant PDF metadata ingestion for discovery, filtering, and routing.
- First‑class Date/Timestamp types with timezone transforms.
- OpenRouter provider with provider routing and structured outputs; Sonnet 4.5 added.
- Better guardrails: quota handling, token capacity checks, and clearer errors.
- “fenic in 120 Seconds” quickstart notebooks to accelerate onboarding.
Document workflows
Parse PDFs into markdown with page chunking and Google support
semantic.parse_pdfnow supports page separators and image descriptions.- Internally optimized token accounting reduces cost and improves batching.
- Compatible with Google Gemini via the native file API; other providers work via standard text prompting.
Usage
pythonimport fenic as fc # Configure Gemini as the default LM session = fc.Session.get_or_create( fc.SessionConfig( app_name="blog_demo", semantic=fc.SemanticConfig( language_models={ "gemini": fc.GoogleDeveloperLanguageModel( model_name="gemini-2.0-flash", rpm=100, tpm=1000, ) }, default_language_model="gemini", ), ) ) # Discover PDFs, then parse into markdown pdfs = session.read.pdf_metadata("data/docs/**/*.pdf", recursive=True) markdown = pdfs.select( fc.col("file_path"), fc.semantic.parse_pdf( fc.col("file_path"), page_separator="--- PAGE {page} ---", describe_images=True, ).alias("markdown"), )
Great for
- Large PDF corpora where per‑page chunking improves throughput and prompt fit.
- Document pipelines that mix PDF metadata filters with downstream parsing.
Read PDF metadata as a DataFrame
- Ingest size, page_count, author, creation/mod dates, encryption, signatures, image_count, and more.
- Perfect for discovery, quality checks, and targeted parsing.
Usage
pythondf = session.read.pdf_metadata("data/reports/**/*.pdf", recursive=True) df.show()
Time‑aware types and transforms
You can now treat dates and timestamps as first‑class citizens, including timezone‑aware conversions.
- New DateType and TimestampType across the stack.
- Convert to/from UTC wall‑clock semantics with Spark‑style helpers.
Usage
pythonimport fenic as fc # Parse timestamps, then convert to/from specific timezones logs = session.read.docs("data/logs/*.json", content_type="json") ts = logs.select( fc.dt.to_timestamp(fc.col("when"), "yyyy-MM-dd HH:mm:ss").alias("ts_utc") ) la_local_then_utc = ts.select( fc.dt.to_utc_timestamp(fc.col("ts_utc"), "America/Los_Angeles").alias("la_to_utc") ) utc_to_la_rendered_as_utc = ts.select( fc.dt.from_utc_timestamp(fc.col("ts_utc"), "America/Los_Angeles").alias( "utc_viewed_as_la_but_in_utc" ) )
More models and providers
- OpenRouter provider support with provider routing and structured outputs strategies.
- Sonnet 4.5 added to the model catalog.
- Gemini native token counter for accurate cost/limit accounting.
OpenRouter quickstart
pythonimport fenic as fc # Requires OPENROUTER_API_KEY config = fc.SessionConfig( app_name="or_demo", semantic=fc.SemanticConfig( language_models={ "or": fc.OpenRouterLanguageModel( model_name="openai/gpt-4o", profiles={ "default": fc.OpenRouterLanguageModel.Profile( provider=fc.OpenRouterLanguageModel.Provider(sort="latency") ) }, default_profile="default", ) }, default_language_model="or", ), ) session = fc.Session.get_or_create(config)
When to use OpenRouter
- Price/latency/throughput‑aware routing across multiple providers.
- Structured outputs via response_format or forced tool‑calling when needed.
Reliability and performance
- Do not retry OpenAI 429 “quota exhausted” responses; fail fast with clear errors.
- Token‑capacity guardrails: raise early if a request cannot be satisfied by model limits.
- Batch client stability fixes for OpenRouter; more predictable behavior.
- Public S3/HuggingFace access works without credentials for public datasets.
parse_pdfreturns aMarkdownTypecolumn for clearer semantics.- Better defaults for Gemini 2.5 Pro when no reasoning profiles are set.
- Test hardening around non‑deterministic paths.
Upgrading from v0.4.x
- Reader API: Use
session.read.docs(paths, content_type="markdown" | "json")rather than nested.json()/.markdown()helpers. - OpenRouter: set
OPENROUTER_API_KEYand configure withOpenRouterLanguageModel(provider routing is optional). - Removed
o1-minidue to lack of JSON Schema structured outputs. http_appno longer accepts aportparameter; use the ASGI app runner or CLI entrypoints to set server ports.- Gemini: native token counting is now used under the hood for accuracy.
Documentation and examples
- “fenic in 120 Seconds” — a fast, practical notebook series that walks through core operators, joins, extraction, and more.
- README/docs updated, plus simpler MCP example flow.
- OpenRouter guidance with structured outputs, provider routing, and troubleshooting tips.
Try it out and tell us what you build
Install
bashpip install fenic
Set a provider key
bash# Choose at least one export OPENAI_API_KEY=... export ANTHROPIC_API_KEY=... export GOOGLE_API_KEY=... export COHERE_API_KEY=... export OPENROUTER_API_KEY=...
Hello, PDFs
pythonimport fenic as fc session = fc.Session.get_or_create( fc.SessionConfig( app_name="hello_pdfs", semantic=fc.SemanticConfig( language_models={ "gemini": fc.GoogleDeveloperLanguageModel( model_name="gemini-2.0-flash", rpm=100, tpm=1000 ) }, default_language_model="gemini", ), ) ) pdfs = session.read.pdf_metadata("data/**/*.pdf", recursive=True) md = pdfs.select( fc.col("file_path"), fc.semantic.parse_pdf( fc.col("file_path"), page_separator="--- PAGE {page} ---", ).alias("markdown"), ) md.show()
Questions or ideas? We’d love your feedback and if you hit an edge case, file an issue with a small repro and we’ll jump on it.

