We love the HN firehose, but we miss gems all the time. When we looked for a great use case for a Fenic powered agent, we kept bumping into HN’s basic search.
What if we could use LLMs and a tight agentic loop to surface the discussions we actually care about?
That question kicked off a fun project that ended with an agent answering on behalf of the HN hivemind.
Built with Fenic (dataframes + MCP server) and PydanticAI (agent orchestration). This is a deep research agent designed to turn sprawling HN threads into structured, citation‑backed insights.
Code repository: typedef-ai/fenic-examples · hn_agent
(Check the Appendix, for three example research runs at the end of the post.)
Problem & Goals
We wanted an agent that answers open questions the way HN actually discussed them, grounded in threads, with citations, and strong recall/precision.
Instead of using the HN search that is based on keyword matches, where there's little flexibility in filtering the data in any other way, be able to ask questions about the meaning and the opinions of the conversations that are happening among the community members.
For example, instead of searching for the term "Rust" and then try to figure out what stories are related to what we are looking for, search for "What's the opinion of the HN community about Rust".
And not just surface the most relevant stories that contain information related to that but also guide the user to what to look for into the threads and different posts that have been surfaced and most importantly, why.
This post shares how we built it and the design choices that made it both accessible and effective.
- Show how accessible it is to build a sophisticated, high‑precision deep research agent by pairing Fenic’s curated context with PydanticAI’s planning.
- Let practitioners (engineers, analysts, PMs) ask open‑ended questions and get evidence‑backed answers grounded in HN discussions.
- Offload inference (e.g., summarization) to Fenic tools so the agent spends its tokens on reasoning and synthesis rather than rote work.
- Prefer reproducible, typed outputs over prose to enable downstream analysis: e.g., aggregate themes across stories in Fenic, join to metadata, and compute trends with citations.
TL;DR: What You’ll Learn
Could an agent explain “Rust vs Go” the way HN actually debated it, across scattered threads, while preserving precision and citations?
We built exactly that: a deep research agent that turns HN into structured, typed insights.
Why This Works
- The right tools, not too many:
search_stories
,read_story
,summarize_story
. A tiny, well‑named toolset is easier for tool‑using models to plan with and much harder to misuse. - Typed outputs end‑to‑end: predictable summaries and a structured final report make downstream analysis trivial.
- Curated context: model the data appropriately and expose the right tools (e.g., regex keyword search that LLMs are good at), then precompute
comment_to_story
,story_threads
, andstory_discussions
(markdown) so the agent doesn’t wrangle recursion during research.
Why recursion is inevitable here
- HN comments form trees: each comment points to a parent (or a story root). Mapping a comment to its story or reconstructing discussion order requires walking parent chains. Naively, that’s a recursive join every time; the denormalized lookups avoid it.
What’s Under the Hood?
-
Components
- Data layer (Fenic): loads base tables and builds denormalized/derived tables.
- Dataset source (Hugging Face via Fenic
hf://
): share and reuse datasets without bespoke loaders. - MCP tools (Fenic): three tools registered in the session catalog and served over HTTP.
- MCP server (Fenic): discovers tools from the catalog and exposes them at
http://localhost:8080/mcp
. - Research agent (PydanticAI): bounded planning loop with typed
DeepResearchReport
output. - CLI: runs questions end‑to‑end and prints structured results.
-
Data Flow
- Ingest → denormalize → format:
- Load base tables (items, comments, users, etc.) from Hugging Face into Fenic.
- Build
comment_to_story
(every comment → root story) andstory_threads
(hierarchical paths) to eliminate recursion at query time. - Render
story_discussions
as markdown transcripts for efficient summarization.
- Serve tools:
- Register tools in the same Fenic session; list and expose them via the Fenic MCP server.
- Orchestrate research:
- A PydanticAI agent performs bounded searches, calls summaries via tools, and synthesizes a typed final report with citations.
- Ingest → denormalize → format:
Why Denormalize Once?
Pipeline
- Load base tables from Hugging Face into Fenic (
comments
,items
,stories
,users
, etc.). - Denormalize recursive relationships into lookup tables:
comment_to_story
: fast comment→story back‑references for search.story_threads
: precomputed tree with zero‑paddedpath
for stable ordering.
- Format discussions into
story_discussions
:- Ready‑to‑summarize markdown transcripts with story metadata and counts.
Rationale
- Curated context beats brute force: moving recursion out of the hot path avoids per‑request traversals and keeps planning simple.
- Deterministic conversation structure: the precomputed thread hierarchy gives the agent an accurate, deterministic view of each discussion; reconstructing on the fly is error‑prone and costly.
- Markdown transcripts avoid re‑formatting threads on every summary and fit naturally into typed responses.
The Three Tools That Matter
Why this tool surface works for LLMs
-
Small and sharp: a tiny set of well‑named tools reduces planning errors.
-
Consistent schemas: predictable parameter names and outputs enable reliable chaining.
-
Clear semantics: each tool does one thing; models handle such tools far more reliably.
-
search_stories(pattern)
- Regex search across titles, URLs, story text, and all related comments.
- Relevance ranking: title > URL > text > comments; then sort by date.
- Uses
comment_to_story
to elevate comment matches to root stories without recursion.
-
read_story(story_id)
- Returns the story with hierarchical comments (one row per node) using
story_threads
. - Includes metadata (author karma, domain, counts) and stable ordering via zero‑padded
path
.
- Returns the story with hierarchical comments (one row per node) using
-
summarize_story(story_id, language='en', extra_instructions='')
- Delegates summarization to Fenic’s
semantic.map
with a concise prompt, producing a typedStorySummary
. - Offloads bulk inference out of the agent loop so the agent spends its tokens on reasoning and synthesis, not pasting long threads.
- Emits: TL;DR, overview, key points, themes with stance, off‑topic themes, risks, actions, and sources. Skips LLM if no comments exist.
- Delegates summarization to Fenic’s
How Does the Agent Think?
-
System prompt constrains the flow:
- Up to 5 searches with broad → refined patterns.
- Summarize the most relevant stories.
- Use
read_story
sparingly for metadata not present in summaries. - Synthesize a
DeepResearchReport
with citations.
-
Output model (
DeepResearchReport
):- question, method, key_findings, themes, controversies, sources, limitations.
-
Why this agent pattern works
- The agent plans and synthesizes; tools do the heavy lifting. This separation keeps chain‑of‑thought focused on reasoning.
- Typed responses reduce drift and make synthesis deterministic.
- We opted for a single agentic loop (not multiple specialized agents); specialized work like summarization is offloaded to Fenic tools via MCP.
What Trade‑offs Did We Choose?
-
Curated context vs. on‑demand joins
- Chosen: precompute
comment_to_story
,story_threads
,story_discussions
to keep recall/precision high and latency predictable. - Trade‑off: extra storage/time upfront; periodic refresh to ingest new data.
- Chosen: precompute
-
Minimal tool surface vs. flexibility
- Chosen: three LLM‑friendly tools to reduce planning errors and cost of misuse.
- Note: we intentionally skipped semantic ranking; advanced LLMs combined with regex retrieval work well here.
-
Offloading inference for single‑loop agents
- Chosen: delegate summarization to a tool so the agent focuses tokens on reasoning and synthesis.
- Operational note: Fenic manages the offloaded execution context; typed schemas make stitching reliable and easy to refine.
-
Typed, structured outputs vs. free‑form prose
- Chosen: schemas for summaries and final report to enable downstream analysis in Fenic.
- Trade‑off: prompt/schema discipline needed; changes must be versioned.
-
Flexible LLM configuration
- Mix and match models for best results (e.g., a fast summarizer tool + a stronger reasoning model for synthesis), or run search/read without any LLM.
Results and Gotchas
-
Results (illustrative)
- Search returns relevant stories across titles and comments due to curated context and ranking.
- Summaries surface themes, disagreements, and actionables with citations—typed outputs make synthesis consistent.
- Offloading summarization keeps the agent’s token budget focused on reasoning steps.
-
Limitations
- Dataset coverage limited to 2025 by default; older discussions require extending ingestion.
- Benefit: you can trade off latency/cost vs. summary quality by swapping models.
Your Likely Questions, Answered
-
Why not semantic search for discovery?
- Regex is predictable, cheap, and transparent, and models do a great job with it here.
-
How does offloading inference help?
- Summarization runs inside a tool with its own limits, so the agent avoids pasting long threads and instead, spends its tokens on reasoning and synthesis.
-
Can I reproduce your results exactly?
- Yes for search/read on the same dataset snapshot; summaries may vary by model and temperature.
-
What about bias or missing context?
- We always include story IDs for verification and encourage reading originals.
What’s Next?
- Multi‑year ingestion for trend analysis across time.
- Streaming agent output to expose progress and partial findings.
- URLs in citations for instant verification from the CLI, including direct links to referenced comments.
- Evals and benchmarking: measure retrieval accuracy, summary quality, and cost per answer; publish the harness and metrics.
- Caching summaries: Fenic is really good at batch inference, there's no reason to summarize on demand as we do right now. Instead, we can summarize the threads during the data ingestion phase. By doing this, we can reduce latencies tremendously but also add more determinism to our agent as the summaries will remain the same among different runs.
Acknowledgments
Tell Us Where We’re Wrong
-
Have ideas for better denormalization or alternative tool surfaces? Open a PR or ping us—citations and counter‑examples welcome.
-
Code repository: typedef-ai/fenic-examples · hn_agent
-
Dataset: Hugging Face Hacker News Dataset (https://huggingface.co/datasets/typedef-ai/hacker-news-dataset).
-
Frameworks: Fenic (https://github.com/typedef-ai/fenic), PydanticAI (https://github.com/pydantic/pydantic-ai).
-
Community: Hacker News (https://news.ycombinator.com) authors and commenters whose discussions power the insights.
Appendix: Example Research Runs
Example 1 — What does Hacker News think about “vibe coding”?
-
Question: What does Hacker News think about “vibe coding”?
-
Methods used:
- Ran 3 broad-to-refined regex searches across titles/URLs/text/comments for vibe‑coding terms
- Summarized 8 high‑signal threads spanning Show HN, news articles, and opinion posts
- Synthesized recurring themes, concrete examples, and points of disagreement across discussions
-
Key findings:
- Speed and access are the big wins: powerful for rapid MVPs/demos, including non‑traditional builders.
- Quality, maintainability, and tech debt are the main costs: outputs often require expensive cleanup and rework.
- Senior devs as “AI babysitters”: review overhead, bloated PRs, and ownership/accountability issues are common.
- Hype vs reality: strong skepticism toward “get rich quick” marketing; tools augment good devs but aren’t silver bullets.
- Security/compliance/liability risks: concrete incidents include API key leaks; heightened caution in regulated domains.
- Prototype strategy is contested: throw away vs evolve; risk of hardening fragile foundations.
- Skills/craft concerns: critics warn of skill erosion; supporters cite real productivity gains.
- Micro‑tools vs longevity: useful quick utilities vs largely throwaway for long‑lived projects.
-
Controversies/debates:
- Net effect on productivity: acceleration vs degraded quality and longer review cycles [45242788, 45245425].
- Throwaway vs evolve: whether to discard vibe‑coded prototypes or harden them over time [45235458].
- Determinism/reproducibility of LLM outputs for production workflows [45242788].
- Labor market effects: empowerment vs displacement of junior roles; shifting responsibilities for seniors [45189965, 45242788].
- Acceptability in regulated domains (e.g., health): educational value vs diagnostic risk/liability [45157020].
- Is vibe coding genuine progress or mostly marketing hype? [45189965].
-
Research limitations:
- Time‑bounded sample around Sep 2025; space evolving quickly.
- Search centered on explicit “vibe coding” phrasing; may miss related agent/copilot threads.
- Used AI‑generated summaries; some nuance may be lost versus full‑tree reads.
- One story had minimal/no summarized content available [45241001].
- HN readership skews developer/tech; may not generalize.
Example 2 — What agentic framework is considered the most popular?
-
Question: What agentic framework is considered the most popular (per HN discussions)?
-
Methods used:
- 5 regex searches for agent frameworks and ecosystem terms
- Summarized 7 stories with discussions (max 8), focusing on adoption cues, comparisons, sentiment
- Synthesized across stories: frequency of mention, interoperability, surrounding tooling, practical usage
- Cited story IDs for evidence; avoided full‑thread dumps
-
Key findings:
- LangGraph emerges as the de facto popular agent framework/runtime in recent HN discussions.
- LangChain and LlamaIndex remain common defaults for RAG pipelines/guardrails; continued ecosystem presence.
- New/alternative frameworks appear with lighter traction (e.g., VoltAgent (45346636)).
- Many practitioners orchestrate multi‑agent systems without heavyweight frameworks; when concrete runtimes are cited, LangGraph is frequent.
- Signals of maturity around LangGraph: state‑machine orchestration narratives, persistence/correlation, ecosystem gravity.
-
Controversies/debates:
- Production‑readiness of agent swarms/framework‑heavy workflows; maintainability vs “code slop”.
- Guardrails placement (ingest, retrieval‑time, output) in RAG+agents stacks using LangChain/LlamaIndex.
- Complexity/maintenance burden of correctness pipelines (e.g., LangGraph + RAG + knowledge‑graph self‑validation).
- Stack maturity trade‑offs: multi‑tool pipelines (AgentLightning, verl, vLLM, AgentOps, LangGraph) enable results but add complexity.
-
Research limitations:
- HN discussions are self‑selected; low‑comment posts limit generalizability.
- Popularity inferred from HN mentions, integrations, tooling; not independent usage metrics.
- Regex searches may miss synonyms (e.g., “orchestrator”, “workflow engine”) and non‑English mentions.
- Some cited items had little/no discussion; relied on presence/linkage rather than sentiment depth.
Example 3 — What does Hacker News think about LangChain?
-
Question: What does Hacker News think about LangChain?
-
Methods used:
- 5 regex searches for relevant stories/discussions, including broad and comparative queries
- Selected 7 representative threads and used AI summaries to extract sentiments/experiences/comparisons
- Synthesized cross‑story patterns; cited story IDs for traceability
-
Key findings:
- Overall sentiment mixed, leaning skeptical for production use; can feel heavy/brittle for some teams.
- Useful ideas/patterns acknowledged: planning, tool‑use, subagents, and LangGraph‑style agent graphs.
- Alternatives/lighter stacks frequently recommended: direct SDKs, DSPy, PydanticAI, Llamafile/llama.cpp, Vercel AI SDK.
- Trust/ecosystem concerns: reliability, breaking changes, roadmap clarity, vendor lock‑in (IDs: 43676084, 44837601, 44838733).
- Open SWE interest plus debates on licensing/openness (AGPL, hosted components) and desire for local‑first, CLI/self‑hosted agents (IDs: 44838733, 44837601).
- MCP and security: powerful local tool integration vs auth/session/key‑exposure concerns (ID: 43676084).
- Benchmark marketing skepticism: e.g., “10,000x faster” claims spark methodology debates; compare like‑for‑like (often LangGraph vs LangChain) (ID: 43274435).
- Developers want better docs, testability, onboarding; concerns include prompts‑as‑logic and validating behavior (ID: 43434171).
- Meta‑theme: hype vs substance; many frame news as incremental; urge pragmatic evaluation against simpler approaches (IDs: 44761299, 44348415).
-
Controversies/debates:
- LangChain overengineered/bloated vs helpful abstractions; adapters/primitives vs raw SDKs (IDs: 44348415, 44761299).
- Are agent frameworks necessary for most use cases? Reliability and added failure modes debated (IDs: 44348415, 43274435).
- Open SWE’s licensing/posture (AGPL, hosted components) vs claims of being open‑source and local‑first (IDs: 44838733, 44837601).
- MCP suitability for production: power vs security/auth gaps (ID: 43676084).
- Benchmark claims and fair comparisons: measurement rigor, baseline selection; LangGraph often the more apt comparator than LangChain (ID: 43274435).
-
Research limitations:
- Emphasis on 2025‑era threads; earlier sentiment not deeply sampled.
- Summaries rely on AI‑generated digests; may miss nuance from specific chains.
- Full comment trees were not fetched for line‑by‑line verification; citations are story‑level.
- HN audience skews developer‑practitioner; may overrepresent production pragmatism/skepticism.