<< goback()

Semantic Layer 2025: MetricFlow vs Snowflake vs Databricks

Typedef Team

Semantic Layer 2025: MetricFlow vs Snowflake vs Databricks

Which Semantic Layer Should You Choose in 2025? dbt MetricFlow vs Snowflake Semantic Views vs Databricks Unity Catalog Metric Views

TL;DR: All three solutions reached GA in 2025, but serve different architectures. Choose dbt MetricFlow for vendor-neutral, multi-warehouse flexibility; Snowflake Semantic Views for native Snowflake performance and simplicity; Databricks Metric Views for lakehouse-centric organizations. Performance is comparable across all three when used in their native environments. The real differentiator is your existing data stack.


Executive Summary: Key Findings

After evaluating maturity, features, performance, governance, and ecosystem integration across all three semantic layer solutions, here are the critical takeaways:

The Bottom Line

All three solutions are production-ready as of late 2025, but they target fundamentally different architectures:

  • dbt MetricFlow: Best for multi-cloud, multi-warehouse environments requiring vendor independence
  • Snowflake Semantic Views: Optimal for Snowflake-native organizations prioritizing simplicity and AI integration
  • Databricks Metric Views: Ideal for lakehouse architectures unifying data science, ML, and BI workloads

Performance Verdict

Performance is near-equivalent when each solution operates in its native environment. All three:

  • Generate optimized SQL rather than adding compute overhead
  • Leverage their respective engines' query optimization
  • Support complex metrics (ratios, distinct counts, derived metrics) correctly
  • Handle large-scale data with sub-second to few-second latency for typical BI queries

The performance difference lies not in raw speed, but in architecture fit: mismatched solutions create overhead through data movement or integration complexity.

Feature Completeness Scorecard

Capabilitydbt MetricFlowSnowflake Semantic ViewsDatabricks Metric Views
Complex Metrics✅ Extensive (ratios, conversions, derived)✅ Good (formulas, derived)✅ Good (ratios, derived, distinct counts)
Multi-hop Joins✅ Automatic via semantic graph✅ Via relationship definitions✅ Star schema support
Time Grains✅ Native multi-grain support⚠️ Implicit (via dimensions)⚠️ Manual dimension definitions
Governance✅ Version control, lineage, tests✅ Native RBAC, audit logs✅ Unity Catalog (RBAC, lineage, certification)
Multi-warehouse✅ Yes (Snowflake, BQ, Databricks, etc.)❌ Snowflake only❌ Databricks only
AI/LLM Ready✅ GraphQL API, OSI standard✅ Cortex integration, OSI✅ Databricks Assistant integration
BI Tool Integration✅ Broad (JDBC, GraphQL)⚠️ Growing (native to Snowsight)⚠️ Growing (native to Databricks SQL)

Quick Decision Matrix

Choose dbt MetricFlow if:

  • You use multiple data warehouses (Snowflake + BigQuery, for example)
  • You need metrics portable across BI tools and vendors
  • You already have a mature dbt practice
  • Open standards and avoiding lock-in are priorities
  • You want the most mature metric type support (conversion metrics, etc.)

Choose Snowflake Semantic Views if:

  • Your data warehouse is exclusively Snowflake
  • You want zero external dependencies or services
  • Native Cortex AI integration is valuable
  • Simplicity of deployment matters (no additional infrastructure)
  • You prefer point-and-click semantic modeling in Snowsight

Choose Databricks Metric Views if:

  • You're all-in on the Databricks Lakehouse
  • You need metrics unified across data science, ML, and BI
  • Unity Catalog is already your governance layer
  • You want domain-driven metric organization and discovery
  • Spark/Delta Lake is your primary analytical engine

Why Semantic Layers Matter Now

The Problem: The Metric Chaos Tax

Every data team knows this pain: the CFO asks "What was Q3 revenue?" and receives three different answers—from three different dashboards, all claiming to be correct. One dashboard calculated revenue as SUM(order_total), another as SUM(order_total - discounts), and a third excluded refunds.

This isn't just embarrassing. It erodes trust in data, slows decision-making, and creates a "metric reconciliation tax" where analysts spend more time explaining discrepancies than generating insights.

The traditional culprits:

  • BI tool silos: Each tool (Tableau, Looker, Mode) maintains its own metric logic
  • SQL copy-paste culture: Analysts duplicate metric definitions with subtle variations
  • No single source of truth: Metrics live in tribal knowledge, not governed code
  • AI hallucination risk: LLMs writing their own SQL create metrics that are confidently wrong

The Solution: Semantic Layers as the Metrics API

A semantic layer creates a governed API for metrics—defining business calculations once and exposing them consistently to every consumer, whether human or AI.

The value proposition:

  1. Consistency: One definition of "revenue" used everywhere
  2. Self-service: Business users query metrics without writing SQL
  3. AI-ready: LLMs fetch pre-defined metrics instead of guessing table joins
  4. Governance: Version control, lineage, and access control for metrics
  5. Performance: Optimized SQL generation eliminates redundant computation

The 2025 Inflection Point

Three factors made 2025 the "year of the semantic layer":

1. The Agentic Analytics Wave LLMs are moving from "answer questions" to "run analyses." But an LLM writing its own SQL is unreliable—it hallucinates table names, botches joins, and produces inconsistent logic. Semantic layers give AI deterministic metric definitions, turning natural language queries from "interesting demos" into production tools.

2. Multi-tool Proliferation Modern data stacks juggle 5-10+ BI and analytics tools. Maintaining consistent metrics across Tableau, Hex, notebooks, and custom apps is untenable without centralization.

3. Cloud Data Platform Maturity Snowflake, Databricks, and BigQuery are now powerful enough to handle semantic layer computation at scale. Early semantic layers (like legacy OLAP cubes) required separate infrastructure; modern approaches push computation to the warehouse.

The Open Semantic Interchange (OSI) Initiative

In 2025, competitors became collaborators. dbt Labs, Snowflake, Salesforce (Tableau), and others launched OSI—an effort to standardize semantic layer definitions. The goal: define a metric once in vendor-neutral YAML, and have every tool (BI, AI, notebooks) consume it.

This comparison evaluates three implementations emerging from this ecosystem, each betting on a different strategy for semantic layer dominance.


Evaluation Methodology

How We Evaluated These Solutions

To ensure a fair, comprehensive comparison, we assessed each semantic layer across six dimensions:

1. Maturity & Stability

  • Release timeline (preview → GA)
  • Version stability (pre-1.0 vs production-hardened)
  • Community adoption signals
  • Vendor commitment (resourcing, roadmap)

2. Feature Completeness

  • Metric definition capabilities (simple aggregations → complex conversions)
  • Relationship modeling (join handling, multi-hop support)
  • Time-based analysis (time grains, period-over-period)
  • Extensibility (APIs, custom logic support)

3. Governance & Trust

  • Version control integration
  • Lineage and documentation
  • Access control mechanisms
  • Testing and validation capabilities

4. Performance & Scale

  • Query compilation approach
  • Execution engine integration
  • Large dataset handling
  • Caching and optimization strategies

5. Ecosystem Integration

  • BI tool compatibility (native connectors vs SQL passthrough)
  • AI/LLM readiness (APIs for agentic analytics)
  • Development workflows (CLI, notebooks, UI)

6. Real-World Usability

  • Learning curve for analysts
  • Deployment complexity
  • Operational overhead
  • Platform dependencies

What We Did NOT Test

Bias Disclosure: This comparison synthesizes vendor documentation, architecture analysis, and community feedback—but does not include hands-on performance benchmarks on identical datasets. Why?

  1. Apples-to-oranges problem: Each solution is tightly coupled to its platform. Running "identical" queries requires data replication across Snowflake, Databricks, and multiple dbt-connected warehouses—introducing confounding variables.

  2. Architectural differences: dbt MetricFlow generates SQL for external execution; Snowflake/Databricks execute natively. Comparing latency would measure warehouse performance more than semantic layer efficiency.

  3. Use case variance: Performance depends heavily on metric complexity, data volume, cluster size, and caching state—making synthetic benchmarks misleading.

Our approach instead: Analyze each solution's architectural approach to performance, identify theoretical advantages/constraints, and report real-world usage patterns from vendor case studies and community feedback.

Evaluation Period

  • Primary sources reviewed: November 2025 - January 2026
  • Software versions: dbt MetricFlow v0.209 (Oct 2025), Snowflake Semantic Views (GA Aug 2025), Databricks Metric Views (Public Preview June 2025, GA late 2025)
  • Documentation: Official vendor docs, Summit/conference presentations, OSI working group materials

Deep Dive: dbt MetricFlow

Architecture Overview

dbt MetricFlow operates as a query generation engine sitting between BI tools and data warehouses. It doesn't store or compute data itself—instead, it translates metric requests into optimized SQL that executes on your existing warehouse (Snowflake, BigQuery, Redshift, Databricks, etc.).

The flow:

  1. Analyst requests "revenue by customer, last 7 days"
  2. MetricFlow compiles this into a dataflow-based query plan
  3. Plan is optimized and translated to warehouse-specific SQL dialect
  4. SQL executes on the warehouse; results return via Arrow Flight
  5. Results flow back to the requesting tool (Tableau, Hex, custom app, etc.)

This architecture makes MetricFlow vendor-agnostic—your metrics travel with your dbt project, not locked to a single database.

Maturity Assessment

Timeline:

  • 2022: Transform launches MetricFlow as standalone product
  • Early 2023: dbt Labs acquires Transform, integrates MetricFlow
  • Late 2023: dbt Semantic Layer goes GA at Coalesce conference
  • October 2025: MetricFlow open-sourced under Apache 2.0

Current state: Version 0.209 (still pre-1.0), but production-ready in dbt Cloud with real-world usage across thousands of organizations. The 0.x versioning reflects rapid evolution, not instability—dbt is cautious about semantic versioning while features are actively expanding.

Adoption signal: The open-sourcing in 2025 demonstrates long-term commitment. dbt is contributing MetricFlow to the OSI initiative alongside Snowflake and Salesforce, positioning it as a potential industry standard.

Verdict: Mature enough for production, but expect frequent updates. Stability has improved significantly post-GA, though the fast-evolving nature means minor version bumps may introduce breaking changes.

Feature Deep Dive

Semantic Modeling Approach

MetricFlow uses semantic models defined in YAML files within your dbt project. A semantic model maps to a star schema pattern:

yaml
semantic_models:
  - name: orders
    entities:
      - name: order_id
        type: primary
      - name: customer_id
        type: foreign
    measures:
      - name: order_total
        agg: sum
      - name: order_count
        agg: count
    dimensions:
      - name: order_date
        type: time

Entities are the keys that establish relationships—MetricFlow uses these to automatically perform joins. If you query a metric from orders with a dimension from customers, the engine knows to join on customer_id.

Metric Type Support

MetricFlow's standout feature: native support for complex metric types without custom SQL.

Simple metrics: Direct aggregations

yaml
metrics:
  - name: total_revenue
    type: simple
    measure: order_total

Ratio metrics: Numerator/denominator specified separately (avoiding the "sum of ratios" anti-pattern)

yaml
metrics:
  - name: average_order_value
    type: ratio
    numerator: total_revenue
    denominator: order_count

Derived metrics: Formulas combining other metrics

yaml
metrics:
  - name: revenue_per_customer
    type: derived
    expr: total_revenue / customer_count

Conversion metrics: Cohort-based metrics over time windows (e.g., "percent of users who made a purchase within 30 days of signup")

yaml
metrics:
  - name: signup_to_purchase_rate
    type: conversion
    conversion_type_params:
      entity: user_id
      calculation: conversion_rate
      window: 30 days
      base_measure: signups
      conversion_measure: purchases

This last type is particularly powerful—building conversion funnels manually in SQL is error-prone and verbose. MetricFlow handles the complex date windowing and cohort logic automatically.

Time Dimension Handling

MetricFlow treats time as a first-class dimension with native multi-grain support. Define a time dimension once:

yaml
dimensions:
  - name: order_date
    type: time
    type_params:
      time_granularity: day

Then query at any grain:

  • metric by order_date__day
  • metric by order_date__week
  • metric by order_date__month
  • metric by order_date__year

The engine automatically applies the appropriate DATE_TRUNC logic. No need to create separate "order_month" and "order_year" dimension columns.

Join Intelligence

MetricFlow's semantic graph understands relationships and performs multi-hop joins automatically. Example:

Orders (fact) → Customers (dim) → Regions (dim)

If you query total_revenue by region_name, MetricFlow:

  1. Recognizes region_name lives in the regions table
  2. Traces the relationship path: orders.customer_idcustomers.customer_idcustomers.region_idregions.region_id
  3. Generates a two-hop join with correct cardinality handling (avoiding fan-out)

This prevents the common mistake of joining dimensions directly to facts and double-counting.

Performance Architecture

Query Compilation Process

When a metric request arrives, MetricFlow:

  1. Parses the request (which metrics, which dimensions, which filters)
  2. Builds a dataflow graph representing the computation
  3. Optimizes the graph (push down filters, eliminate unnecessary joins)
  4. Generates SQL in the target warehouse's dialect

The SQL generation is smart about complex metrics. For a ratio metric, MetricFlow won't compute SUM(revenue)/SUM(orders) naively (which would be wrong if you're grouping by dimensions). Instead, it generates:

sql
SELECT
  customer_region,
  SUM(order_total) / COUNT(DISTINCT order_id) as aov
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id
WHERE order_date >= CURRENT_DATE - 7
GROUP BY customer_region

For even more complex metrics (like conversion rates with time windows), it generates subqueries with correct date logic.

Execution and Transfer

Once compiled, the SQL executes entirely in the warehouse. MetricFlow adds near-zero compute overhead—the bottleneck is warehouse performance, not the semantic layer.

Results transfer via Arrow Flight SQL, a high-performance protocol that:

  • Streams large result sets efficiently
  • Reduces serialization overhead vs. JDBC/ODBC
  • Enables columnar data transfer (faster for analytical queries)

In practice, this means the "tax" of routing through dbt Cloud's API is minimal—a few milliseconds for small results, manageable even for large results due to Arrow's efficiency.

The Performance Win: Centralization

The real performance benefit isn't faster individual queries—it's eliminating redundant computation. Without a semantic layer:

  • Tableau dashboard computes SUM(revenue) with complex filters
  • Mode dashboard computes the same metric slightly differently
  • Hex notebook recomputes with yet another SQL variant
  • Each hits the warehouse separately, often with full table scans

With MetricFlow:

  • One metric definition
  • One optimized SQL query plan
  • Potential for shared query results (if BI tools coordinate, or if warehouse result caching kicks in)
  • Analysts aren't re-inventing SQL for the same metrics

Governance and Lineage

Metrics as Code

MetricFlow definitions live in YAML files within your dbt project. This means:

Version control: Metrics are committed to git, with full history and blame ✅ Code review: Metric changes go through PR review before merging ✅ CI/CD integration: Test metric definitions before deploying to production ✅ Documentation: Inline descriptions that appear in dbt docs and BI tools

Example metric with governance metadata:

yaml
metrics:
  - name: net_revenue
    label: "Net Revenue (after refunds)"
    description: |
      Total order revenue minus refunds and discounts.
      Used for all financial reporting. Do not confuse
      with 'gross_revenue' metric.
    type: derived
    expr: gross_revenue - refunds - discounts
    meta:
      owner: finance_team
      certified: true

Lineage Visualization

dbt Explorer shows metrics as nodes in the lineage DAG:

raw_sources → staging_models → orders (semantic model) → net_revenue (metric)

Clicking on a metric shows:

  • Upstream dependencies (which tables/models feed it)
  • Downstream usage (which dashboards query it)
  • Definition and documentation
  • Recent changes (git history)

This lineage is critical for trust—analysts can verify exactly which raw data feeds each metric.

Access Control

Access control is warehouse-level, not metric-level. The dbt Cloud service role needs SELECT on underlying tables. Individual users' permissions flow from the tool they're using:

  • Querying via Tableau: Tableau's database credentials determine access
  • Querying via the GraphQL API: API keys can be scoped to specific environments
  • Querying via JDBC: Standard database authentication

There's no native "user X can see revenue but not cost" at the metric layer—those restrictions would be enforced by warehouse views or row-level security.

Ecosystem Integration

BI Tool Connectivity

Two API options:

1. JDBC (Arrow Flight SQL) Acts like a database connection. Tools see a virtual database with "tables" (actually metric views):

sql
SELECT
  customer_region,
  total_revenue,
  order_count
FROM semantic_layer.metrics
WHERE order_date >= '2025-01-01'

Any SQL-aware tool works: Tableau, Power BI, Excel, DBeaver, even Google Sheets plugins.

2. GraphQL API Structured queries for programmatic access:

graphql
{
  metrics(
    metrics: ["total_revenue", "order_count"]
    dimensions: ["customer_region"]
    where: "order_date >= '2025-01-01'"
  ) {
    customer_region
    total_revenue
    order_count
  }
}

Perfect for custom web apps, chatbots, or AI agents.

Current Integrations

Native support:

  • Hex
  • Mode
  • Deepnote
  • Preset (Apache Superset)
  • ThoughtSpot

Via JDBC: Any tool supporting JDBC/ODBC (Tableau, Looker, Sigma, etc.)

Custom: Python library (dbt-metricflow) for notebooks or scripts

AI and Agentic Analytics

MetricFlow's killer feature for AI: deterministic metric definitions that LLMs can reference by name.

The problem with LLM-generated SQL:

  • Hallucinates table names
  • Botches join logic
  • Inconsistent definitions (asks for "revenue" but writes different SQL each time)

The MetricFlow solution:

  • LLM knows "revenue" is a defined metric
  • Asks the semantic layer via GraphQL: query { metric(name: "revenue") }
  • Gets correct result without writing SQL

dbt reports 83% accuracy in internal tests of natural language queries when backed by the Semantic Layer, vs. ~40% when the LLM writes raw SQL.

Integration pattern:

User: "What was Q3 revenue by region?"
  ↓
LLM (with MetricFlow plugin):
  1. Identifies metric: "revenue"
  2. Identifies dimension: "region"
  3. Identifies time filter: "Q3"
  4. Calls MetricFlow GraphQL API
  ↓
MetricFlow: Generates SQL, executes, returns results
  ↓
LLM: Formats answer with context

Strengths

Platform independence: Works across Snowflake, BigQuery, Databricks, Redshift—metrics aren't locked to one vendor ✅ Rich metric types: Best-in-class support for complex metrics (conversions, ratios, derived) ✅ Metrics as code: Full dbt workflow (git, tests, CI/CD, documentation) ✅ Flexible APIs: Both SQL and GraphQL for broad tool compatibility ✅ Open standard: Apache 2.0 license, OSI member—future-proof against lock-in ✅ Mature ecosystem: Large dbt community, established best practices

Limitations

Requires dbt infrastructure: Need dbt Cloud (paid) or self-hosted semantic layer service ❌ Learning curve: YAML semantic model syntax is new; requires training ❌ Extra network hop: Queries route through dbt Cloud API (mitigated by Arrow, but adds latency) ❌ Warehouse-level access control: No native row/column-level security per metric ❌ Still evolving: 0.x versioning means potential breaking changes in updates

Ideal Use Cases

✅ Best fit:

  • Multi-cloud organizations (Snowflake + BigQuery)
  • Teams already using dbt for transformations
  • Organizations prioritizing vendor neutrality
  • Data platforms serving multiple BI tools
  • AI/LLM-heavy analytics use cases (chatbots, agents)

❌ Poor fit:

  • Teams without dbt expertise or willingness to adopt
  • Single-warehouse shops wanting native simplicity
  • Organizations requiring metric-level row security
  • Teams wanting GUI-only metric management (no code)

Deep Dive: Snowflake Semantic Views

Architecture Overview

Snowflake takes a radically different approach: semantic definitions live inside the database as first-class objects, not in external tooling. A Semantic View is a new Snowflake object type (alongside tables, views, and stages) that encapsulates your business logic.

The flow:

  1. Define a semantic view via DDL, UI wizard, or YAML import
  2. Semantic view becomes a queryable object in your Snowflake schema
  3. Users query with SELECT * FROM SEMANTIC_VIEW(...)
  4. Snowflake's query planner rewrites this into optimized SQL on physical tables
  5. Results come directly from Snowflake—no middleware

This native integration means zero external infrastructure and tight coupling with Snowflake's security and performance features.

Maturity Assessment

Timeline:

  • Q1 2025: Semantic Views announced in preview
  • June 2025: GA announced at Snowflake Summit
  • August 2025: Querying functionality reaches GA
  • Late 2025: Cross-account sharing remains in private preview

Current state: Core functionality (define and query semantic views) is production-ready as of August 2025. Advanced features like Data Sharing for semantic views are still maturing.

Adoption signal: Snowflake positioned Semantic Views as a strategic pillar at Summit 2025, co-announced with the OSI initiative. Integration with Cortex AI (Snowflake's LLM platform) demonstrates long-term commitment.

Verdict: Production-ready for internal use within a single Snowflake account. Early days for cross-account or partner ecosystem integration, but improving rapidly.

Feature Deep Dive

Semantic Modeling Approach

Snowflake's model centers on logical tables that abstract physical tables:

Creating a semantic view:

sql
CREATE SEMANTIC VIEW sales_analytics AS
  LOGICAL TABLES
    orders (
      FACTS order_amount, tax_amount
      DIMENSIONS customer_id, product_id, order_date
      PRIMARY KEY order_id
    ),
    customers (
      DIMENSIONS customer_id, region, segment
      PRIMARY KEY customer_id
    ),
    products (
      DIMENSIONS product_id, category, brand
      PRIMARY KEY product_id
    )
  RELATIONSHIPS
    orders.customer_id = customers.customer_id,
    orders.product_id = products.product_id
  METRICS
    total_revenue AS SUM(order_amount),
    avg_order_value AS SUM(order_amount) / COUNT(DISTINCT order_id),
    revenue_per_customer AS SUM(order_amount) / COUNT(DISTINCT customer_id)

Key concepts:

  • Facts: Numeric columns that can be aggregated (e.g., order_amount)
  • Dimensions: Categorical attributes for slicing (e.g., region, product_category)
  • Time Dimensions: Special dimensions for temporal analysis (period-over-period, trending)
  • Metrics: Named aggregations over facts, with defined formulas
  • Relationships: Join paths between logical tables (one-to-many, many-to-one)

Alternative Definition: YAML Import

For teams wanting version control, Snowflake supports YAML specs (the same format used by Cortex Analyst):

yaml
semantic_model:
  name: sales_analytics
  tables:
    - name: orders
      base_table: raw.orders
      measures:
        - name: revenue
          aggregation: sum
          expr: order_amount
      dimensions:
        - name: order_date
          expr: order_timestamp::date
    - name: customers
      base_table: raw.customers
      dimensions:
        - name: region
        - name: segment
  relationships:
    - from: orders.customer_id
      to: customers.customer_id

Upload via:

sql
CREATE SEMANTIC VIEW sales_analytics
FROM @my_stage/semantic_model.yaml;

This bridges the "infrastructure as code" gap, though Snowflake doesn't yet have native git integration like dbt.

Metric Definition Capabilities

Simple aggregations:

sql
METRICS
  total_sales AS SUM(order_amount),
  order_count AS COUNT(order_id)

Calculated metrics (formulas):

sql
METRICS
  net_revenue AS SUM(order_amount * (1 - discount_pct))

Ratios (computed correctly across dimensions):

sql
METRICS
  avg_order_value AS SUM(order_amount) / COUNT(DISTINCT order_id)

Snowflake handles the aggregation logic properly—it won't naively compute AVG(order_amount) if the grain changes.

Derived metrics (building on other metrics):

sql
METRICS
  profit AS revenue - cost,
  profit_margin AS profit / revenue

Complex expressions (multi-step calculations):

sql
METRICS
  churn_rate AS
    COUNT(DISTINCT CASE WHEN last_order_date < CURRENT_DATE - 90 THEN customer_id END)
    / COUNT(DISTINCT customer_id)

Time Dimension Handling

Time dimensions are marked explicitly:

sql
DIMENSIONS
  order_date AS order_timestamp::date
    TIME_DIMENSION

Queries can then use time-series functions:

  • Period-over-period comparisons (Q3 2025 vs Q3 2024)
  • Rolling windows (30-day moving average)
  • Time-based cohort analysis

While Snowflake doesn't have dbt's automatic grain shortcuts (order_date__month), you can define multiple time dimensions:

sql
DIMENSIONS
  order_date AS order_timestamp::date,
  order_month AS DATE_TRUNC('month', order_timestamp),
  order_quarter AS DATE_TRUNC('quarter', order_timestamp)

Or let BI tools handle time bucketing.

Join Handling

Snowflake's semantic engine performs automatic joins based on relationship definitions. For a query like:

sql
SELECT * FROM SEMANTIC_VIEW(
  sales_analytics
  DIMENSIONS customers.region, products.category
  METRICS total_revenue
)
WHERE products.category = 'Electronics'

The engine:

  1. Identifies required tables: orders (has the fact), customers (has region), products (has category)
  2. Constructs joins: orders → customers on customer_id, orders → products on product_id
  3. Applies filter: WHERE products.category = 'Electronics'
  4. Aggregates: SUM(orders.order_amount) GROUP BY customers.region, products.category

Multi-hop joins are supported (e.g., orders → stores → regions), and Snowflake prevents fan-out by respecting cardinality in the relationship definitions.

Performance Architecture

Native Query Optimization

Semantic Views execute entirely within Snowflake's engine, using the same query optimizer as regular SQL. When you issue a SEMANTIC_VIEW() query:

  1. Parsing: Snowflake parses the special syntax
  2. Rewriting: The optimizer expands it into standard SQL (joins, aggregations, filters)
  3. Optimization: Standard Snowflake optimizations apply:
    • Micro-partition pruning (skip irrelevant data files)
    • Column pruning (only read necessary columns)
    • Join order optimization
    • Predicate pushdown
  4. Execution: Distributed execution on Snowflake's compute clusters

Example rewrite:

User query:

sql
SELECT * FROM SEMANTIC_VIEW(
  sales_analytics
  DIMENSIONS region
  METRICS total_revenue
)

Generated SQL (conceptual):

sql
SELECT
  c.region,
  SUM(o.order_amount) as total_revenue
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
GROUP BY c.region

The generated SQL is indistinguishable from hand-written SQL in performance—no overhead beyond what you'd write manually.

Caching Strategies

Snowflake's result cache works transparently with semantic views. If you run the same SEMANTIC_VIEW() query twice (and underlying data hasn't changed), the second query returns instantly from cache.

Warehouse cache (local SSD) also applies—frequently accessed micro-partitions stay "warm" for faster reads.

For extremely expensive metrics queries repeatedly, you could create a materialized view on top:

sql
CREATE MATERIALIZED VIEW monthly_revenue_snapshot AS
SELECT * FROM SEMANTIC_VIEW(
  sales_analytics
  DIMENSIONS order_month, region
  METRICS total_revenue
);

But this defeats the flexibility purpose—materialized views lock you into specific dimensions.

Correctness for Complex Aggregations

Snowflake's semantic engine handles tricky aggregations correctly:

Distinct counts across joins:

sql
METRICS
  unique_customers AS COUNT(DISTINCT customer_id)

If querying by product category (requiring a join from orders → products), Snowflake ensures the distinct count happens after the join, not before.

Ratios with different grains:

sql
METRICS
  revenue_per_product AS SUM(order_amount) / COUNT(DISTINCT product_id)

Snowflake computes the numerator and denominator separately, then divides—avoiding the "sum of ratios" mistake.

Scale and Latency

Performance scales with warehouse size:

  • Small warehouse: Sub-second for simple metrics on moderate data (<1M rows)
  • Medium warehouse: Few seconds for complex metrics on large data (10M-100M rows)
  • Large/X-Large: Handles billions of rows with parallel processing

Real-world example: A customer with 500M fact rows and 20M dimension rows reported:

  • Simple metric by single dimension: 2-3 seconds (Medium warehouse)
  • Complex ratio across 3-hop join: 8-12 seconds (Large warehouse)
  • Same queries with warm cache: <1 second

The performance is equivalent to well-written SQL—the semantic layer doesn't add compute overhead.

Governance and Security

Access Control

Semantic Views support standard Snowflake RBAC:

sql
GRANT SELECT ON SEMANTIC VIEW sales_analytics TO ROLE analyst_role;

The key security benefit: users don't need access to underlying tables. They query the semantic view, which abstracts raw data access.

Example:

  • Raw table: raw.orders (contains sensitive PII)
  • Semantic view: sales_analytics (exposes only aggregated metrics)
  • Analyst role: SELECT on sales_analytics, no access to raw.orders

This enables governed self-service—analysts explore metrics without seeing or misusing raw data.

Row-level security: Applied at the table level (via secure views or row access policies). If raw.orders has RLS filtering by region, those filters automatically flow through to semantic view queries.

Metadata and Auditing

Snowflake extends its Information Schema with semantic objects:

sql
-- List all metrics in a semantic view
SHOW SEMANTIC METRICS IN sales_analytics;

-- Query metadata
SELECT metric_name, aggregation, description
FROM INFORMATION_SCHEMA.SEMANTIC_METRICS
WHERE semantic_view_name = 'SALES_ANALYTICS';

Audit logging: All SEMANTIC_VIEW() queries appear in ACCOUNT_USAGE.QUERY_HISTORY, so you can track:

  • Which users query which metrics
  • Query frequency and patterns
  • Performance bottlenecks

Lineage

Snowflake's Account Usage views capture semantic dependencies:

sql
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.OBJECT_DEPENDENCIES
WHERE referenced_object_name = 'SALES_ANALYTICS'
  AND referenced_object_type = 'SEMANTIC_VIEW';

This shows which tables feed the semantic view, though it's more manual than dbt's visual lineage DAG.

Documentation and Discovery

Add descriptions directly in DDL:

sql
CREATE SEMANTIC VIEW sales_analytics
COMMENT = 'Certified revenue metrics for Sales team reporting'
AS ...
  METRICS
    total_revenue AS SUM(order_amount)
      COMMENT = 'Sum of all order amounts, excluding refunds'

Comments appear in Snowsight's object explorer and can be queried programmatically.

Synonyms for AI: Semantic views support business-friendly aliases:

sql
METRICS
  total_revenue AS SUM(order_amount)
    SYNONYM ('sales', 'revenue', 'total sales')

When an LLM queries via Cortex, it can map "sales" → total_revenue metric automatically.

Ecosystem Integration

Snowsight UI

Snowflake's web UI provides:

  • Visual semantic model builder (drag-and-drop table relationships)
  • Metric browser (explore available metrics, see definitions)
  • Query builder (select dimensions/metrics without writing SQL)

AI-assisted modeling: Snowflake's Cortex can suggest:

  • Likely join keys between tables
  • Common dimension/fact classifications
  • Auto-generated metric descriptions

This lowers the barrier for non-technical users to create semantic views.

BI Tool Compatibility

Current state (late 2025):

✅ Native support:

  • Snowsight (Snowflake's built-in BI)
  • Any tool supporting custom SQL (can write SEMANTIC_VIEW() queries)

🚧 In progress:

  • Tableau (via OSI partnership)
  • Sigma Computing
  • ThoughtSpot
  • Power BI (via ODBC with custom SQL)

Future vision: BI tools will offer native connectors that:

  1. Connect to Snowflake
  2. Detect available semantic views
  3. Present metrics as draggable fields (like connecting to an OLAP cube)
  4. Generate SEMANTIC_VIEW() queries transparently

Until then, users can write semantic view queries in SQL editors.

Cortex AI Integration

Snowflake's Cortex Analyst (LLM-powered SQL generation) uses semantic views as its source of truth:

Without semantic views:

User: "What was Q3 revenue by region?"
LLM: Writes SQL (might use wrong table, forget filters, etc.)
Accuracy: ~40-50%

With semantic views:

User: "What was Q3 revenue by region?"
Cortex:
  1. Identifies semantic view: sales_analytics
  2. Maps "revenue" → total_revenue metric
  3. Maps "region" → customers.region dimension
  4. Generates SEMANTIC_VIEW query
Accuracy: ~85-90% (per Snowflake internal tests)

REST API for external AI:

python
import requests

response = requests.post(
  'https://<account>.snowflakecomputing.com/api/v2/cortex/analyst',
  json={
    'question': 'What was Q3 revenue by region?',
    'semantic_model': 'sales_analytics'
  },
  headers={'Authorization': f'Bearer {token}'}
)

results = response.json()['data']

This enables building chatbots or agents that query Snowflake with natural language while using governed metrics.

Strengths

Zero external infrastructure: No separate service to deploy or maintain ✅ Native performance: Fully optimized by Snowflake's engine, no middleware latency ✅ Tight security integration: RBAC, row-level security, audit logs all native ✅ AI-first design: Synonyms and Cortex integration purpose-built for LLMs ✅ Snowsight UI: Point-and-click modeling for non-coders ✅ OSI member: Semantic definitions intended to be portable (future)

Limitations

Snowflake-only: Can't use semantic views on data in BigQuery, Databricks, etc. ❌ New feature with gaps: Cross-account sharing, some integrations still maturing ❌ Proprietary syntax: SEMANTIC_VIEW() is Snowflake-specific SQL ❌ Version control overhead: YAML export works, but no native git integration ❌ BI tool support pending: Most tools can't yet "discover" metrics natively

Ideal Use Cases

✅ Best fit:

  • Snowflake-native organizations (all data in Snowflake)
  • Teams wanting simplicity (no external semantic layer service)
  • Cortex AI users (tight integration)
  • Governance-first environments (hide raw data, expose metrics)
  • Non-technical users creating metrics (Snowsight wizard)

❌ Poor fit:

  • Multi-warehouse environments (Snowflake + BigQuery)
  • Organizations requiring metric portability across platforms
  • Teams deeply invested in dbt workflows (no dbt integration yet)
  • Use cases requiring immediate, broad BI tool support

Deep Dive: Databricks Unity Catalog Metric Views

Architecture Overview

Databricks positions Metric Views as the semantic layer for the Lakehouse—unifying data science, ML, and BI under one governed metric framework. Unlike traditional BI-focused semantic layers, Databricks aims for metrics that work equally well in:

  • SQL dashboards (Databricks SQL, Tableau)
  • Notebooks (Python/Scala/R data exploration)
  • ML pipelines (feature engineering)
  • AI agents (LLM-powered analytics)

The flow:

  1. Define a metric view in Unity Catalog (via YAML, SQL DDL, or UI)
  2. Metric view appears as a catalog object alongside tables and views
  3. Users query with SELECT ... MEASURE(metric_name) ...
  4. Spark SQL / Photon engine compiles and executes the query
  5. Results return directly from Databricks compute

This lakehouse-native approach means metrics are first-class citizens in the data layer, not a separate BI construct.

Maturity Assessment

Timeline:

  • June 2025: Announced at Data + AI Summit, enters Public Preview
  • Late 2025: General Availability rollout (AWS, Azure, GCP)
  • Q4 2025: Partner integrations (Tableau, Sigma, etc.) begin

Current state: GA as of late 2025, making it the newest of the three solutions. Public Preview feedback from design partners (Virgin Atlantic, others) helped refine the feature before GA.

Adoption signal: Unity Catalog itself is widely adopted (replaced legacy Hive Metastore for most modern Databricks deployments). Metric Views extend this proven governance foundation.

Verdict: Production-ready for early adopters within Databricks. Ecosystem integrations are ramping up rapidly, but community knowledge and best practices are still forming.

Feature Deep Dive

Semantic Modeling Approach

Databricks uses metric views defined in YAML or via SQL. A metric view contains:

YAML definition example:

yaml
metric_view:
  name: sales_metrics
  source_table: main.sales.orders
  joins:
    - table: main.sales.customers
      on: orders.customer_id = customers.customer_id
    - table: main.sales.products
      on: orders.product_id = products.product_id

  measures:
    - name: total_revenue
      expr: SUM(order_amount)
      description: "Total order revenue before discounts"

    - name: avg_order_value
      expr: SUM(order_amount) / COUNT(DISTINCT order_id)
      description: "Average revenue per order"

    - name: unique_customers
      expr: COUNT(DISTINCT customer_id)

  dimensions:
    - name: customer_region
      expr: customers.region

    - name: product_category
      expr: products.category

    - name: order_date
      expr: DATE(order_timestamp)

Register in Unity Catalog:

sql
CREATE METRIC VIEW main.sales.sales_metrics
FROM 'dbfs:/metrics/sales_metrics.yaml';

Or define inline:

sql
CREATE METRIC VIEW main.sales.sales_metrics AS
SELECT
  SUM(order_amount) AS total_revenue,
  COUNT(DISTINCT order_id) AS order_count,
  customers.region AS customer_region,
  products.category AS product_category
FROM main.sales.orders
JOIN main.sales.customers ON orders.customer_id = customers.customer_id
JOIN main.sales.products ON orders.product_id = products.product_id;

Measure Definition Capabilities

Simple aggregations:

yaml
measures:
  - name: total_sales
    expr: SUM(amount)

Complex calculations:

yaml
measures:
  - name: net_revenue
    expr: SUM(amount * (1 - discount_rate) - refund_amount)

Distinct counts (handled correctly across joins):

yaml
measures:
  - name: unique_products_sold
    expr: COUNT(DISTINCT product_id)

Ratios (computed properly at any dimension grain):

yaml
measures:
  - name: avg_margin
    expr: SUM(revenue - cost) / SUM(revenue)

Derived measures (referencing other measures):

yaml
measures:
  - name: revenue_per_customer
    expr: total_revenue / unique_customers

The engine ensures derived measures are computed after base measures aggregate, avoiding logic errors.

Dimension Handling

Dimensions can be direct columns or expressions:

yaml
dimensions:
  - name: order_year
    expr: YEAR(order_date)

  - name: customer_tier
    expr: CASE
            WHEN lifetime_value > 10000 THEN 'VIP'
            WHEN lifetime_value > 1000 THEN 'Standard'
            ELSE 'Basic'
          END

Time dimensions are treated like any other dimension (no special time grain shortcuts yet). For multi-grain time analysis, define multiple dimensions:

yaml
dimensions:
  - name: order_date
    expr: DATE(order_timestamp)

  - name: order_month
    expr: DATE_TRUNC('month', order_timestamp)

  - name: order_quarter
    expr: DATE_TRUNC('quarter', order_timestamp)

Or let analysts apply time functions in queries:

sql
SELECT
  DATE_TRUNC('month', order_date) AS month,
  MEASURE(total_revenue)
FROM main.sales.sales_metrics
GROUP BY month;

Join Handling: Star Schema Support

Metric views support multi-table joins for star/snowflake schemas:

yaml
joins:
  - table: dim_customers
    on: fact_orders.customer_id = dim_customers.customer_id

  - table: dim_products
    on: fact_orders.product_id = dim_products.product_id

  - table: dim_stores
    on: fact_orders.store_id = dim_stores.store_id

  - table: dim_regions
    on: dim_stores.region_id = dim_regions.region_id  # Multi-hop

When you query a measure with a dimension from dim_regions, Databricks:

  1. Traces the join path: fact_orders → dim_stores → dim_regions
  2. Constructs a two-hop join with correct cardinality
  3. Applies the measure aggregation after joins complete

This prevents fan-out (double-counting) that occurs with naive joins.

Performance Architecture

Spark SQL + Photon Execution

Metric view queries execute on Databricks SQL warehouses, which use:

  • Spark SQL: Distributed query engine with Catalyst optimizer
  • Photon: Vectorized execution engine (2-3× faster than standard Spark for analytical queries)
  • Delta Lake: Columnar storage with ACID guarantees and time travel

Query compilation process:

User query:

sql
SELECT
  customer_region,
  MEASURE(total_revenue),
  MEASURE(unique_customers)
FROM main.sales.sales_metrics
WHERE product_category = 'Electronics'
GROUP BY customer_region;

Databricks:

  1. Parses the MEASURE() syntax
  2. Expands measures into their definitions (SUM(order_amount), COUNT(DISTINCT customer_id))
  3. Injects joins from the metric view spec
  4. Optimizes via Catalyst (predicate pushdown, join reordering)
  5. Executes with Photon (vectorized aggregation)

Generated physical plan (conceptual):

sql
SELECT
  c.region AS customer_region,
  SUM(o.order_amount) AS total_revenue,
  COUNT(DISTINCT o.customer_id) AS unique_customers
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE p.category = 'Electronics'
GROUP BY c.region;

The performance is equivalent to writing this SQL manually—the metric view adds zero compute overhead.

Optimization Strategies

Data skipping: Delta Lake tracks min/max stats per file. Queries with filters (WHERE order_date >= '2025-01-01') skip irrelevant files automatically.

Predicate pushdown: Filters push down to the storage layer before data is read into memory.

Join optimization:

  • Small dimension tables are broadcast to all executors (avoiding shuffle)
  • Join order is optimized based on table statistics
  • Bloom filters eliminate non-matching rows early

Vectorized aggregation (Photon): SUM, COUNT, AVG operations process data in columnar batches, leveraging CPU SIMD instructions.

Adaptive Query Execution (AQE): Spark SQL adjusts the query plan mid-execution based on runtime statistics (e.g., dynamically choosing broadcast vs. shuffle joins).

Caching and Materialization

Delta cache: Local SSD cache on cluster nodes. Frequently accessed data (e.g., small dimension tables) stays in cache, reducing S3/ADLS reads.

Query result cache: Identical queries return instantly if underlying data hasn't changed (similar to Snowflake's result cache).

Materialized views (optional): For extremely expensive metrics queried frequently at fixed dimensions:

sql
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
  order_month,
  customer_region,
  MEASURE(total_revenue)
FROM main.sales.sales_metrics
GROUP BY order_month, customer_region;

But this sacrifices flexibility—metric views intentionally avoid pre-aggregation to enable ad-hoc slicing.

Scale and Latency

Performance scales with warehouse size:

  • Serverless SQL: Auto-scales based on query complexity (recommended for ad-hoc analytics)
  • Classic warehouses: Fixed cluster size (Small, Medium, Large)

Real-world benchmarks (Databricks internal tests):

  • 10M row fact table, 100K dimension rows: 1-2 seconds (Small warehouse)
  • 500M row fact table, 5M dimension rows: 5-8 seconds (Medium warehouse)
  • 10B row fact table, 100M dimension rows: 30-60 seconds (Large warehouse with Photon)

These match hand-written SQL performance—the metric view abstraction doesn't slow queries.

Cost optimization: By centralizing metric logic, teams avoid redundant computation. Example: Before metric views, three dashboards each ran expensive revenue queries separately. With metric views, all three use the same definition—warehouse result cache serves two of them instantly.

Governance and Trust

Unity Catalog Integration

Metric views are first-class Unity Catalog objects, inheriting all governance features:

Access control:

sql
GRANT SELECT ON METRIC VIEW main.sales.sales_metrics TO `marketing_team`;

Users need SELECT on the metric view, not on underlying tables—enabling governed self-service.

Row-level security: Enforced at the table level. If main.sales.orders has row filters (e.g., WHERE region = current_user_region()), those filters apply transparently to metric view queries.

Column masking: Similarly, if customer_email is masked for non-privileged users, queries via metric views respect that masking.

Certification and Trust Signals

Mark metrics as certified (trusted for production use):

sql
ALTER METRIC VIEW main.sales.sales_metrics
SET TBLPROPERTIES ('certified' = 'true', 'owner' = 'finance_team');

In Databricks UI, certified metrics display a trust badge, helping users distinguish official metrics from experimental ones.

Tagging:

sql
ALTER METRIC VIEW main.sales.sales_metrics
SET TAGS ('domain' = 'finance', 'pii' = 'false');

Tags enable:

  • Discovery (filter metrics by domain)
  • Compliance (track which metrics contain PII)
  • Cost allocation (chargeback by domain)

Lineage Visualization

Unity Catalog UI shows end-to-end lineage for metric views:

S3 raw data → Bronze tables → Silver tables → Gold tables → Metric View → Tableau Dashboard

Clicking on a metric view reveals:

  • Upstream dependencies (which tables feed it)
  • Downstream consumers (which dashboards, notebooks, or ML jobs query it)
  • Schema evolution (how the metric definition has changed over time)

Programmatic lineage access:

sql
SELECT * FROM system.access.table_lineage
WHERE target_name = 'main.sales.sales_metrics';

Audit and Observability

Query logging: Every metric view query is logged in:

sql
SELECT * FROM system.access.audit
WHERE request_params.metric_view = 'main.sales.sales_metrics';

Track:

  • Who queried which metrics
  • Query frequency (identify popular metrics for optimization)
  • Performance patterns (slow queries, large result sets)

Insights tab (UI): Each metric view has an "Insights" dashboard showing:

  • Most frequently queried measures
  • Top users
  • Query latency percentiles
  • Last refresh timestamp (for metrics views with scheduled refresh)

Ecosystem Integration

Databricks-Native Tools

Databricks SQL Editor: Autocompletes metric and dimension names:

sql
SELECT
  customer_region,  -- Autocompletes dimensions
  MEASURE(total_revenue),  -- Autocompletes measures
  MEASURE(avg_order_value)
FROM main.sales.sales_metrics
-- WHERE autocompletes dimensions
WHERE product_category = 'Electronics'
GROUP BY customer_region;

Databricks Notebooks: Query from Python/Scala/R:

python
df = spark.sql("""
  SELECT
    customer_region,
    MEASURE(total_revenue) as revenue
  FROM main.sales.sales_metrics
  GROUP BY customer_region
""")

display(df)

Databricks Assistant (AI helper): Natural language queries use metric views:

User: "What was revenue by region last quarter?"
Assistant:
  → Identifies metric view: sales_metrics
  → Generates SQL with MEASURE(total_revenue)
  → Returns results with explanation

BI Tool Integrations

Announced partnerships (late 2025):

  • Tableau (Salesforce): Native connector to pull Unity Catalog metrics
  • Power BI (Microsoft): Via enhanced ODBC driver
  • Sigma Computing: Direct metric browsing and drag-and-drop
  • ThoughtSpot: AI-driven search over metrics
  • Hex: Notebook integration with metric autocomplete

Current state: Most tools can query via SQL passthrough:

sql
-- In Tableau custom SQL
SELECT
  customer_region,
  MEASURE(total_revenue) AS revenue
FROM main.sales.sales_metrics
GROUP BY customer_region

Future state: Native connectors will:

  1. Connect to Unity Catalog
  2. List available metric views
  3. Present measures/dimensions as draggable fields
  4. Generate MEASURE() queries automatically

This matches the "semantic layer as OLAP cube" UX many analysts expect.

AI and ML Integration

Databricks Assistant + Genie: Conversational analytics powered by metric views:

User: "Compare Q3 vs Q2 revenue growth by product category"
Genie:
  → Uses sales_metrics metric view
  → Computes MEASURE(total_revenue) for Q3 and Q2
  → Calculates growth rate
  → Returns chart + natural language summary

LLM integration patterns: External AI agents (ChatGPT, Claude) can query Databricks via:

  1. JDBC/ODBC connection: AI writes MEASURE() queries
  2. Databricks API: REST calls to execute SQL programmatically
  3. Lakehouse Federation: Query Unity Catalog metrics from external tools

ML feature engineering: Use metrics as features:

python
# Pull metric aggregations for ML training
feature_df = spark.sql("""
  SELECT
    customer_id,
    MEASURE(total_lifetime_revenue) AS ltv,
    MEASURE(avg_purchase_frequency) AS frequency
  FROM main.sales.customer_metrics
  WHERE signup_date < '2025-01-01'
""")

# Train churn model using governed metrics
from pyspark.ml.classification import LogisticRegression
model = LogisticRegression().fit(feature_df)

This ensures ML models and BI dashboards use identical metric definitions (e.g., both define "LTV" the same way).

Data Marketplace / Discovery

Databricks Data Marketplace (formerly Discover) surfaces metric views as data products:

Organization by domain:

  • Finance domain → revenue_metrics, cost_metrics, margin_metrics
  • Marketing domain → campaign_metrics, attribution_metrics
  • Sales domain → pipeline_metrics, quota_metrics

Self-service catalog:

  • Search: "revenue" → finds all metrics containing "revenue"
  • Browse: "Finance" domain → see certified metrics
  • Request access: Click "Request Access" → approval workflow via Unity Catalog

Metric view "product page":

  • Description: What this metric measures
  • Owner: finance_team@company.com
  • Certification status: ✅ Certified
  • Sample queries: Example SQL to get started
  • Lineage: Upstream data sources
  • SLA: Updated daily at 6 AM UTC

This turns metrics into discoverable, trustworthy assets rather than tribal knowledge.

Strengths

Unified lakehouse semantics: Same metrics for SQL, Python, ML, and AI—breaking down silos between data science and BI ✅ Unity Catalog governance: RBAC, lineage, audit logs, certification—enterprise-grade trust ✅ High performance: Photon + Delta Lake handle billion-row datasets with sub-minute latency ✅ Domain-driven organization: Metrics grouped by business domain (Finance, Marketing) for intuitive discovery ✅ Flexible querying: Ad-hoc dimension slicing without pre-aggregation (no explosion of materialized tables) ✅ AI/ML first-class support: Databricks Assistant and LLM integrations use metrics natively

Limitations

Databricks-only: Metrics work exclusively within Databricks (can't query from Snowflake or BigQuery) ❌ New feature, limited docs: Best practices still emerging, fewer community examples than dbt ❌ MEASURE() syntax constraint: Must use MEASURE() function (can't treat metric views like normal tables) ❌ Join limitations: Can't arbitrarily join metric views to other tables in one query (must use CTEs or include tables in metric view definition) ❌ Unity Catalog requirement: Requires migrating from legacy Hive Metastore (if not already done) ❌ Time grain handling: No automatic time shortcuts (must define order_month, order_quarter dimensions manually)

Ideal Use Cases

✅ Best fit:

  • Lakehouse-centric organizations (Databricks for data engineering, ML, and BI)
  • Data science + BI alignment (ensure analysts and data scientists use same metrics)
  • Unity Catalog users (extend existing governance to metrics)
  • Domain-driven data architectures (Finance, Marketing, Sales domains)
  • AI-heavy analytics (LLM agents, conversational BI)
  • Large-scale data (billions of rows, Spark's parallelism shines)

❌ Poor fit:

  • Multi-platform environments (data split across Snowflake + Databricks)
  • Teams without Databricks adoption
  • Organizations wanting vendor-neutral metrics
  • Immediate need for broad BI tool support (native integrations still ramping up)
  • Teams preferring pure SQL without new syntax (MEASURE() function)

Head-to-Head Comparison

Architecture Comparison

Dimensiondbt MetricFlowSnowflake Semantic ViewsDatabricks Metric Views
Execution ModelQuery generator (compiles SQL, runs on external warehouse)Native database object (executed by Snowflake engine)Native catalog object (executed by Spark SQL / Photon)
InfrastructureRequires dbt Cloud or self-hosted serviceZero external infrastructure (built into Snowflake)Zero external infrastructure (built into Unity Catalog)
Query InterfaceJDBC (Arrow Flight) + GraphQL APISEMANTIC_VIEW() SQL clauseMEASURE() SQL function
Platform Lock-inNone (works with Snowflake, BigQuery, Redshift, Databricks)Snowflake onlyDatabricks only
Definition StorageYAML files in dbt project (git-versioned)Snowflake schema object (DDL or imported YAML)Unity Catalog object (YAML or SQL DDL)

Feature Comparison Matrix

Featuredbt MetricFlowSnowflake Semantic ViewsDatabricks Metric Views
Simple Metrics✅ Full support✅ Full support✅ Full support
Ratio Metrics✅ Native type (numerator/denominator)✅ Via expressions✅ Via expressions
Derived Metrics✅ Native type (formula from other metrics)✅ Metrics can reference metrics✅ Measures can reference measures
Conversion Metrics✅ Native type (cohort-based, time-windowed)⚠️ Manual SQL required⚠️ Manual SQL required
Time Grains✅ Automatic (metric__day, metric__month)⚠️ Define dimensions per grain⚠️ Define dimensions per grain
Multi-hop Joins✅ Semantic graph auto-joins✅ Relationship definitions✅ Join specs in YAML
Star Schema✅ First-class support✅ Logical tables + relationships✅ Explicit join definitions
Version Control✅ Native (YAML in git)⚠️ Manual (export YAML)⚠️ Manual (export YAML)
Lineage UI✅ dbt Explorer (visual DAG)⚠️ Account Usage queries (not visual by default)✅ Unity Catalog UI (visual graph)
Testing Framework✅ dbt tests on metrics❌ Manual validation⚠️ Spark assertions (not native)
Documentation✅ dbt docs site (auto-generated)⚠️ Comments in DDL✅ Unity Catalog descriptions
RBAC⚠️ Warehouse-level (not metric-level)✅ Per semantic view✅ Per metric view
Row-Level Security⚠️ Via warehouse policies✅ Via Snowflake RLS✅ Via Unity Catalog RLS
Certification/Trust Signals⚠️ Via meta tags (not UI-native)⚠️ Planned (not in GA)✅ Native certification flag

Performance Comparison

Aspectdbt MetricFlowSnowflake Semantic ViewsDatabricks Metric Views
Query CompilationFast (milliseconds)Native (part of query planning)Native (Catalyst optimizer)
Execution SpeedDepends on warehouse (Snowflake, BigQuery, etc.)Native Snowflake performanceSpark SQL + Photon (vectorized)
Network Overhead+1 hop (through dbt Cloud API)None (native)None (native)
Result TransferArrow Flight (efficient)Standard Snowflake result setStandard Spark result set
CachingWarehouse caching onlyResult cache + warehouse cacheDelta cache + result cache + AQE
Large Data (1B+ rows)✅ (warehouse-dependent)✅ (Snowflake's scale)✅ (Spark's distributed compute)
Latency (typical query)100-500ms compile + warehouse timeNative warehouse timeNative Spark time
Cost EfficiencyOne metric definition → reduces redundant queriesSame (centralized logic)Same + AQE optimizes at runtime

Verdict: Performance is equivalent when each runs in its native environment. The "extra hop" in dbt MetricFlow is negligible (~100ms) vs. the cost of the actual query (seconds). All three eliminate the biggest performance problem: redundant, unoptimized SQL across multiple tools.

Governance Comparison

Featuredbt MetricFlowSnowflake Semantic ViewsDatabricks Metric Views
Access Control GranularityWarehouse-levelSemantic view-levelMetric view-level
Row-Level SecurityVia warehouseVia Snowflake RLSVia Unity Catalog RLS
Column MaskingVia warehouseVia Snowflake masking policiesVia Unity Catalog masking
Audit LoggingVia warehouse query logsAccount Usage viewsUnity Catalog audit logs
LineageVisual DAG (dbt Explorer)SQL-based queries (ACCOUNT_USAGE)Visual graph (Unity Catalog UI)
Version ControlNative (git-based)Manual export/importManual export/import
Testingdbt tests (data + schema)ManualSpark assertions
CertificationMeta tags onlyPlannedNative "certified" flag
Data ClassificationTags (custom)Tags (custom)Tags + domains (native)

Winner: Databricks for native governance features (certification, domains, tags). dbt wins for version control and testing. Snowflake offers strong security but lags on native trust signals.

Ecosystem Integration Scorecard

Tool/Use Casedbt MetricFlowSnowflake Semantic ViewsDatabricks Metric Views
Tableau✅ JDBC connector🚧 OSI partnership (in progress)🚧 Native connector (announced)
Power BI✅ JDBC connector✅ ODBC (custom SQL)🚧 Enhanced ODBC (announced)
Looker✅ JDBC connector⚠️ Via custom SQL⚠️ Via custom SQL
Mode✅ Native integration⚠️ Custom SQL⚠️ Custom SQL
Hex✅ Native integration⚠️ Custom SQL🚧 Native integration (announced)
Sigma✅ Via JDBC🚧 OSI partnership🚧 Native connector (announced)
ThoughtSpot✅ Native integration🚧 OSI partnership🚧 Native integration (announced)
Excel / Sheets✅ Via JDBC plugins⚠️ Custom SQL⚠️ Via ODBC
Notebooks✅ Python library❌ SQL only (in Snowflake notebooks)✅ Native (Databricks notebooks)
Custom Apps✅ GraphQL API⚠️ Cortex API (limited)✅ Databricks API
LLM / AI Agents✅ GraphQL + semantic definitions✅ Cortex Analyst (native)✅ Databricks Assistant (native)

Legend: ✅ = Available now | 🚧 = Announced/in progress | ⚠️ = Requires workaround | ❌ = Not supported

Winner: dbt MetricFlow for current breadth (JDBC = universal compatibility). Snowflake and Databricks are catching up rapidly via OSI partnerships.

Metric Type Support Deep Dive

Metric Typedbt MetricFlowSnowflake Semantic ViewsDatabricks Metric Views
Simple Aggregationtype: simpleSUM(col)expr: SUM(col)
Ratio (correct aggregation)type: ratio (native numerator/denominator)SUM(a) / SUM(b)expr: SUM(a) / SUM(b)
Derived (metric from metrics)type: derivedmetric_a - metric_bexpr: measure_a - measure_b
Distinct Countagg: count_distinctCOUNT(DISTINCT col)expr: COUNT(DISTINCT col)
Conversion Funneltype: conversion (cohort-based, time-windowed)❌ Requires complex SQL❌ Requires complex SQL
Cumulativetype: cumulative⚠️ Via window functions⚠️ Via window functions
Custom Expressionsexpr: "..."✅ Any SQL expression✅ Any Spark SQL expression
Percentiles✅ Via custom expressionPERCENTILE_CONT(...)PERCENTILE_APPROX(...)

Winner: dbt MetricFlow for native support of conversion and cumulative metrics (no manual SQL needed). Others require writing window functions or subqueries manually.

Total Cost of Ownership

Cost Factordbt MetricFlowSnowflake Semantic ViewsDatabricks Metric Views
License Costdbt Cloud Team ($100+/user/mo) or self-host (free but ops overhead)Included with Snowflake (no extra cost)Included with Databricks (no extra cost)
Compute CostWarehouse costs (Snowflake, BigQuery, etc.)Snowflake warehouse costsDatabricks SQL warehouse costs
Ops OverheadMedium (manage dbt Cloud or self-hosted service)Low (native, zero ops)Low (native, zero ops)
Learning CurveSteep (YAML semantic models, dbt concepts)Medium (new SQL syntax, Snowsight UI available)Medium (YAML or DDL, MEASURE() syntax)
Migration CostHigh if not already using dbtLow for Snowflake customersLow for Databricks customers
Ecosystem Lock-inLow (portable metrics)High (Snowflake only)High (Databricks only)

For a 50-person data team:

  • dbt MetricFlow: $5K/mo (dbt Cloud) + warehouse compute ($10K/mo) = ~$15K/mo
  • Snowflake Semantic Views: No license cost + warehouse compute (~$10K/mo) = ~$10K/mo
  • Databricks Metric Views: No license cost + SQL warehouse compute (~$12K/mo) = ~$12K/mo

Note: Actual costs vary wildly based on data volume, query frequency, and warehouse sizing. The real TCO question: "Does centralizing metrics reduce redundant queries enough to offset any license costs?" Answer: Usually yes.


Decision Framework: Which Semantic Layer Should You Choose?

The One-Question Test

"Where does your data live, and where will it live in 3 years?"

  • One warehouse, staying that way → Native solution (Snowflake or Databricks)
  • Multiple warehouses, or uncertain → dbt MetricFlow
  • Migrating between platforms → dbt MetricFlow (portable metrics)

This single question eliminates ~70% of the decision complexity. Now let's refine.


Decision Tree

START: Do you have multiple data warehouses (Snowflake + BigQuery, for example)?
│
├─ YES → dbt MetricFlow
│   └─ (Only vendor-neutral option)
│
└─ NO → Is your warehouse Snowflake or Databricks?
    │
    ├─ Snowflake → Do you also use Databricks for ML/data science?
    │   │
    │   ├─ YES → Choose based on primary workload:
    │   │   ├─ Mostly BI/SQL analytics → Snowflake Semantic Views
    │   │   └─ Heavy ML + BI integration → Databricks Metric Views
    │   │
    │   └─ NO → Snowflake Semantic Views
    │       └─ UNLESS you want vendor portability → dbt MetricFlow
    │
    └─ Databricks → Do you need metrics in external tools (Tableau on Snowflake)?
        │
        ├─ YES → dbt MetricFlow
        │   └─ (Databricks Metric Views don't work outside Databricks)
        │
        └─ NO → Databricks Metric Views
            └─ UNLESS you already use dbt → consider dbt MetricFlow

Choose dbt MetricFlow If...

✅ Must-have scenarios:

  1. Multi-warehouse environment: Data in Snowflake + BigQuery + Databricks
  2. Platform migration planned: Moving from Redshift to Snowflake (metrics stay portable)
  3. Vendor neutrality is strategic: Avoiding lock-in is a business requirement
  4. Heavy AI/agent use cases: Need GraphQL API for custom LLM integrations

✅ Strong fit: 5. Mature dbt practice: Already using dbt Core or Cloud for transformations 6. Conversion metrics needed: Tracking funnels, retention, cohort-based KPIs 7. Metrics as code culture: Team values git workflows, PR reviews, CI/CD for metrics 8. Broad BI tool landscape: Supporting 5+ different BI tools across teams

⚠️ Consider alternatives if:

  • You're not using dbt and don't want to adopt it (steep learning curve)
  • You prefer GUI-based metric management (Snowflake/Databricks have UIs)
  • Extra network hop latency is unacceptable (though Arrow mitigates this)
  • You need metric-level RBAC (dbt doesn't support per-metric permissions natively)

Migration path:

  • From spreadsheets/tribal knowledge: Medium effort (requires learning YAML modeling)
  • From Looker/Tableau calculations: High effort (translate all calculated fields to metrics)
  • From other semantic layers: Medium effort (many have YAML export → import to MetricFlow)

Team requirements:

  • Skills needed: dbt familiarity, YAML, SQL, git workflows
  • FTE estimate: 1 analytics engineer (initially) + 0.25 FTE ongoing maintenance
  • Time to value: 1-2 months (model core metrics, integrate with 1-2 BI tools)

Choose Snowflake Semantic Views If...

✅ Must-have scenarios:

  1. Snowflake-only shop: All data in Snowflake, no plans to multi-cloud
  2. Zero-infrastructure preference: Don't want external services (dbt Cloud, etc.)
  3. Cortex AI is strategic: Using Snowflake's LLM features heavily
  4. Simplicity over flexibility: Prefer built-in features vs. best-of-breed tools

✅ Strong fit: 5. Non-technical metric authors: Business analysts who prefer UI wizards over YAML 6. Strong governance needs: Want to expose metrics without exposing raw tables 7. Snowsight as primary BI: Heavy Snowflake SQL Worksheets usage 8. OSI early adopter: Betting on Snowflake's open standard implementation

⚠️ Consider alternatives if:

  • You use Databricks for data science (metrics won't transfer)
  • You need mature BI tool integrations today (most are still in progress)
  • You have data in other clouds (BigQuery, etc.)
  • You want metrics in notebooks outside Snowflake (Jupyter, Hex on separate compute)

Migration path:

  • From Snowflake views: Low effort (semantic views are just enhanced views)
  • From dbt metrics: Medium effort (export YAML, import to Snowflake)
  • From Tableau/Power BI calculations: Medium effort (translate to SQL expressions)

Team requirements:

  • Skills needed: Snowflake SQL, basic data modeling (star schemas)
  • FTE estimate: 0.5 analytics engineer (initially) + 0.1 FTE ongoing
  • Time to value: 2-4 weeks (model core metrics, train users on SEMANTIC_VIEW syntax)

Choose Databricks Metric Views If...

✅ Must-have scenarios:

  1. Lakehouse-centric architecture: Databricks for data engineering, ML, and BI
  2. ML + BI alignment needed: Data scientists and analysts must use same metrics
  3. Unity Catalog already deployed: Extending existing governance to metrics
  4. Domain-driven data org: Finance, Marketing, Sales domains with dedicated metric sets

✅ Strong fit: 5. Python/Spark-heavy teams: Notebooks are primary interface (not SQL-first BI tools) 6. Databricks Assistant users: Leveraging conversational analytics 7. Large-scale data (10B+ rows): Need Spark's distributed compute for metric queries 8. Real-time + batch analytics: Delta Lake's streaming + batch unification

⚠️ Consider alternatives if:

  • You use Snowflake or BigQuery as primary warehouse
  • You need broad BI tool support immediately (native connectors still ramping up)
  • You're not on Unity Catalog (requires migration from Hive Metastore)
  • Your team is SQL-only (MEASURE() syntax might confuse pure SQL users)

Migration path:

  • From Databricks tables/views: Low effort (wrap tables in metric views)
  • From dbt metrics: Medium effort (export YAML, import to Unity Catalog)
  • From ML feature stores: Low effort (features → measures)

Team requirements:

  • Skills needed: Spark SQL, Unity Catalog, YAML or SQL DDL
  • FTE estimate: 0.5-1 data engineer (initially) + 0.25 FTE ongoing
  • Time to value: 3-6 weeks (model metrics, integrate with notebooks + Databricks SQL)

Special Scenarios

Scenario: Multi-cloud with Snowflake + Databricks

Recommendation: dbt MetricFlow

  • Define metrics once in dbt YAML
  • Use dbt adapters for both Snowflake and Databricks
  • Metrics work across both warehouses
  • Some duplication (metrics query Snowflake data separately from Databricks data), but logic is centralized

Scenario: Heavy Tableau + Snowflake, but also Databricks for ML

Recommendation: Snowflake Semantic Views (BI) + dbt MetricFlow (cross-platform)

  • Use Snowflake Semantic Views for Tableau (native integration coming)
  • Use dbt MetricFlow for metrics that span both Snowflake and Databricks
  • Accept some duplication, but prioritize user experience per tool

Scenario: Startup with <50M rows, uncertain future platform

Recommendation: dbt MetricFlow

  • Portability insurance: if you outgrow Snowflake and move to BigQuery, metrics come with you
  • Lower data volumes = performance differences negligible
  • Future-proof for growth

Scenario: Enterprise with 10,000+ employees, strong Databricks commitment

Recommendation: Databricks Metric Views

  • Scale is huge: need Spark's distributed compute
  • Governance is critical: Unity Catalog's certification + domains shine at enterprise scale
  • Tight integration with Databricks AI features (GPT models on Databricks)

Scenario: Finance team (Snowflake) + Marketing team (BigQuery)

Recommendation: dbt MetricFlow

  • Only option that works across both warehouses
  • Define "revenue" once, query from Snowflake or BigQuery depending on source data

Switching Costs: Can You Change Your Mind?

From dbt MetricFlow to Snowflake/Databricks:

  • Effort: Medium (YAML export, translate to native syntax)
  • Downside: Lose multi-warehouse support
  • Upside: Simpler ops (no dbt Cloud), native performance

From Snowflake to dbt MetricFlow:

  • Effort: Medium (export YAML, set up dbt project)
  • Downside: Adds dbt infrastructure
  • Upside: Gain portability, richer metric types (conversion metrics)

From Databricks to dbt MetricFlow:

  • Effort: Medium (export YAML, set up dbt adapters)
  • Downside: Lose Unity Catalog governance integration
  • Upside: Portability, broader BI tool support (today)

Between Snowflake and Databricks:

  • Effort: High (requires moving data or using federation)
  • Reality: You're really switching warehouses, not just semantic layers
  • Advice: Don't choose a semantic layer if the warehouse is wrong

The OSI wildcard: If Open Semantic Interchange delivers on its promise, switching may become trivial (export standard YAML, import to any OSI-compatible system). But that future is 1-2 years away—make today's decision based on today's reality.


Conclusion: The Future of Semantic Layers

The Industry Inflection Point

2025 will be remembered as the year semantic layers went mainstream. Three simultaneous forces converged:

1. The AI Imperative LLMs moved from demos to production analytics. But "agentic analytics" requires deterministic answers—LLMs writing their own SQL produce confidently wrong results. Semantic layers provide the grounding AI needs: pre-defined metrics that an LLM can reference by name instead of hallucinating table joins.

Snowflake's internal tests: 85% accuracy for NL queries with semantic views vs. 40% without. dbt reports similar numbers. This isn't incremental improvement—it's the difference between "interesting toy" and "production tool."

2. The Multi-Tool Reality The "single BI tool" dream is dead. Modern data teams use Tableau (executives), Mode (analysts), Hex (data scientists), and custom apps (product teams) simultaneously. Maintaining consistent metrics across 5+ tools is untenable without centralization.

Semantic layers are the API for metrics—define once, consume everywhere.

3. Cloud Warehouse Maturity Snowflake, Databricks, and BigQuery are now powerful enough to serve as the computation layer for semantic layers. Early semantic layers (1990s OLAP cubes) required separate infrastructure; modern approaches push computation to the warehouse, eliminating ETL and stale data problems.

The OSI Wild Card

The Open Semantic Interchange initiative (dbt, Snowflake, Salesforce, ThoughtSpot, etc.) aims to create a vendor-neutral standard for semantic layer definitions. The vision:

  1. Define metrics once in standard YAML
  2. Register with any OSI-compatible platform (Snowflake, Databricks, dbt, BigQuery)
  3. Query from any OSI-compatible tool (Tableau, Looker, ThoughtSpot, custom apps)

If OSI succeeds, the decision between dbt/Snowflake/Databricks becomes less critical—you'd export metrics from one and import to another with zero rework.

Current reality: OSI is early (working groups, draft specs). Expect meaningful interoperability in 2026-2027, not today.

Our take: OSI will likely succeed for simple metrics (aggregations, basic ratios). Complex features (conversion metrics, advanced time grains) will remain platform-specific. Think of it like SQL itself—there's a standard, but every vendor has dialect extensions.

Where Each Solution Is Headed

dbt MetricFlow's roadmap (based on public statements and OSI participation):

  • Tighter BI tool integrations (native Tableau/Power BI support)
  • Advanced metric types (more complex time-based calculations)
  • Self-service metric creation (web UI, not just YAML)
  • Broader OSI leadership (dbt is de facto driving the standard)

Snowflake Semantic Views' roadmap:

  • Data Sharing for semantic views (share metrics across accounts)
  • Advanced time intelligence (fiscal calendars, custom hierarchies)
  • Snowsight enhancements (drag-and-drop metric exploration)
  • Deeper Cortex integration (semantic views as default for all AI features)

Databricks Metric Views' roadmap:

  • Native BI tool connectors (Tableau, Power BI, Sigma all announced)
  • MLflow integration (metrics → features → models with lineage)
  • Advanced governance (metric-level masking, fine-grained RBAC)
  • Real-time metrics (streaming aggregations via Delta Live Tables)

The Uncomfortable Truth

There is no "best" semantic layer—only the best fit for your architecture.

  • If you're multi-cloud: dbt MetricFlow (only option)
  • If you're Snowflake-only: Snowflake Semantic Views (simplest)
  • If you're Databricks-centric: Databricks Metric Views (deepest integration)

Trying to force a different choice creates friction:

  • Using dbt MetricFlow in a Snowflake-only shop adds complexity for flexibility you don't need
  • Using Snowflake Semantic Views when you also use Databricks for ML creates metric inconsistency
  • Using Databricks Metric Views when most BI is in Tableau on Snowflake creates integration headaches

Match your semantic layer to your data platform—don't fight your stack.

What to Do Tomorrow

If you're evaluating semantic layers:

  1. Answer the one-question test: "Where does your data live?" (Single warehouse vs. multi-cloud)
  2. Prototype with 5-10 core metrics (revenue, active users, etc.) using the recommended solution for your stack
  3. Test with 2-3 BI tools (validate the integration story before full rollout)
  4. Measure impact: Track "time to answer" for common questions (before vs. after semantic layer)
  5. Iterate: Start with one domain (Finance, Sales), prove value, then expand

If you already have one of these:

  • dbt MetricFlow users: Explore the GraphQL API for AI use cases (chatbots, agents)
  • Snowflake users: Enable Semantic Views in dev, test with Cortex Analyst
  • Databricks users: Create metric views for top-10 KPIs, train analysts on MEASURE() syntax

The 2026 Prediction

By the end of 2026:

Semantic layers will be table stakes (not having one will be seen as technical debt) ✅ BI tool vendors will offer native semantic layer connectors (as standard as database drivers) ✅ "Metrics engineers" will be a distinct role (like analytics engineers emerged for dbt) ✅ AI agents will default to semantic layers (raw SQL generation will be seen as risky) ⚠️ OSI will have a 1.0 spec (but adoption will still be fragmented) ❓ One solution might dominate (if Snowflake's market share continues growing, or if dbt's openness wins over multi-cloud shops)

The semantic layer war is just beginning. Choose your weapon wisely—but don't wait. The cost of metric chaos compounds daily, and your competitors are already defining their single source of truth.

The question isn't "Do we need a semantic layer?" It's "Which one, and how fast can we deploy it?"


Further Reading

  • "The Rise of the Semantic Layer" - dbt Labs blog
  • "Snowflake Summit 2025 Keynote" - Semantic Views announcement video
  • "Unity Catalog: The Complete Guide" - Databricks ebook

Share this page
the next generation of

data processingdata processingdata processing

Join us in igniting a new paradigm in data infrastructure. Enter your email to get early access and redefine how you build and scale data workflows with typedef.