<< goback()

What Are Metrics in Unity Catalog? Databricks' Governed Metric Layer Explained

Typedef Team

What Are Metrics in Unity Catalog? Databricks' Governed Metric Layer Explained

Defining a metric once and querying it from SQL, BI, and AI agents is an improvement. A governed metric object still cannot tell you whether the number it just handed back is correct, and there are three reasons. The property that decides correctness is hand-declared. It breaks under the materialization and BI rewrites, by Databricks' own documentation. And it is invisible when the metric sits on a source that already collapsed the grain upstream.

1. What Metrics in Unity Catalog is, in one paragraph

Metrics in Unity Catalog, which Databricks ships under the name Unity Catalog Business Semantics, is a governed semantic layer built into the catalog. The core object is a metric view. A metric view is a Unity Catalog view, written in YAML, that separates dimensions (what you group and filter by, e.g. month) from measures (the numbers you report, e.g. total revenue). You define a measure once, and anyone querying the view can group it by any dimension they want. The view recomputes the measure at query time, so the same definition returns consistent numbers from SQL, dashboards, notebooks, and AI/BI Genie. Metric views went generally available in April 2026, and Databricks is open-sourcing the implementation into Apache Spark. This is better than the old pattern of copying the same CASE WHEN and SUM logic across many dashboards and hoping they agree. The catch, and the reason this post exists, is that a governed definition can be wrong, and the catalog cannot tell you that it is wrong.

2. How a metric view actually works

2.1 The YAML: source, dimensions, and measures

A metric view points at a source table, lists the dimensions people can group by, and lists the measures and how each one aggregates. Here is a complete, valid metric view:

sql
CREATE OR REPLACE VIEW main.sales.orders_metrics WITH METRICS LANGUAGE YAML AS $$
version: 1.1
comment: "Orders and revenue KPIs for sales analysis"
source: samples.tpch.orders
dimensions:
  - name: order_month
    expr: DATE_TRUNC('MONTH', o_orderdate)
  - name: market_segment
    expr: o_orderpriority
measures:
  - name: total_revenue
    expr: SUM(o_totalprice)
  - name: unique_customers
    expr: COUNT(DISTINCT o_custkey)
$$;

Two things to notice. First, a measure is just a SQL aggregate expression, e.g. SUM(o_totalprice). Second, the view stores the definition, not a result. It does not store a precomputed total.

One small compatibility note. Databricks docs use both dimensions: and fields: for the grouping block, depending on the YAML spec version and the page you read. This post uses dimensions:. If your runtime rejects it, switch the key to fields: and the rest is the same.

2.2 Why it recomputes at query time

You do not read a measure column directly. You wrap it in the MEASURE() function, and you choose the grouping at query time:

sql
SELECT
  order_month,
  MEASURE(total_revenue)   AS total_revenue,
  MEASURE(unique_customers) AS unique_customers
FROM main.sales.orders_metrics
GROUP BY order_month
ORDER BY order_month;

MEASURE() does not specify the aggregation. It inherits it from the view definition and applies it to whatever group you asked for. Group by month and you get monthly numbers. Group by segment and you get per-segment numbers. The aggregation logic lives in one place, and the caller picks the grain.

This is the feature's biggest strength. Because the measure is recomputed from the source at query time, a COUNT(DISTINCT o_custkey) defined on raw order rows returns the correct distinct count at every grain. Group by month, quarter, or year, and each one is a fresh distinct count over the rows in that period. Nothing is summed from a stored subtotal. Remember this, because it is what makes the failures later in this post hard to see.

Creating or editing a metric view needs a SQL warehouse or compute on a recent Databricks Runtime (17.3 or later at the time of writing, but check the current minimum in Databricks' create-edit docs) and Unity Catalog. There is no CREATE METRIC VIEW statement. It is a normal CREATE VIEW with the WITH METRICS clause, and you edit it later with ALTER VIEW.

3. How Metrics in Unity Catalog relate to dbt

3.1 The metric view sits on a table, and dbt usually built that table

A metric view does not transform data. It sits on top of a table and adds a query-time aggregation layer. In most Databricks shops, the table underneath was built by dbt. The joins that changed cardinality, the COUNT(DISTINCT) that got rolled into a daily column, and the grain the model chose all live in the dbt model, and the metric view never reads that SQL. The metric view inherits whatever shape the dbt model produced. dbt is the layer where the answer to "is this number valid" lives, and the metric view sits downstream of it.

3.2 Defining metric views from dbt

You can now manage the metric view from dbt instead of hand-writing DDL. As of dbt-databricks v1.12.0 (released May 2026), there is a dedicated materialization. You set materialized='metric_view' and put the metric-view YAML in the model body:

sql
-- models/metrics/orders_metrics.sql
-- Requires dbt-databricks >= 1.12.0 on Unity Catalog (Databricks Runtime 17.3+)

{{ config(materialized='metric_view') }}

version: 1.1
source: "{{ ref('fct_orders') }}"
dimensions:
  - name: order_month
    expr: DATE_TRUNC('MONTH', order_date)
measures:
  - name: total_revenue
    expr: SUM(revenue)
  - name: unique_customers
    expr: COUNT(DISTINCT customer_id)

dbt compiles this to the same CREATE OR REPLACE VIEW ... WITH METRICS LANGUAGE YAML DDL from Section 2 and wires the ref() into the model graph so the view rebuilds when its source does.

Two honest caveats, because this is new. First, meta.metric_view is not the mechanism. That was one of two options proposed in the dbt-databricks feature request, and it is not what shipped: the materialization is. Second, dbt's own configuration docs do not document this yet, so the authoritative source today is the adapter repo. Run dbt compile and read the generated DDL before you rely on it in production. Treat this as an emerging integration, not a settled one.

3.3 dbt MetricFlow vs Unity Catalog metric views, and where OSI fits

The two metric layers make opposite trade-offs.

  • dbt MetricFlow is warehouse-agnostic. The definitions live in the dbt project and compile to whatever warehouse you target. It also auto-derives time grains from a single time dimension, so daily, weekly, and monthly come for free once you declare the time column.
  • Unity Catalog metric views are Databricks-only today. They use the MEASURE() function, cannot be queried from Snowflake or BigQuery, carry some join limits, require Unity Catalog, and need an explicit definition per time grain. In exchange you get the deepest native integration. They are the object AI/BI Genie reads for governed context.

If you have already modeled everything in dbt, Typedef has a walkthrough of moving dbt Semantic Layer definitions to Databricks metric views. See Migrate dbt Semantic Layer to Databricks Metric Views.

There is also a portability effort under way: the Open Semantic Interchange (OSI), an Apache-2.0 project with Snowflake, Databricks, dbt Labs, Salesforce, and others, to carry metric definitions across dialects. It is worth tracking, with the same caveat that runs through this post: it moves the declared definition between tools, not a proof that the definition is correct.

4. The part Databricks gets right about additive and non-additive measures

Databricks gets two things right here. Metric views recompute at query time, which puts them ahead of any layer that stores precomputed aggregates. And they go a step further than most semantic layers. They classify measures by how they aggregate.

Databricks' materialization documentation states this directly. A measure is additive if its result can be correctly recomputed by re-aggregating from partial results. SUM, COUNT, MIN, and MAX are additive. COUNT(DISTINCT), MEDIAN, and percentiles are not, because you cannot get a correct distinct count by adding up smaller distinct counts. Most semantic layers do not encode this distinction at all. Databricks does, and it uses that knowledge to decide when a query can safely reuse a materialization.

So the system knows about additivity. The rest of this post is about what that knowledge does not prevent.

5. Where knowing is not protecting: three places the math still breaks

Knowing a measure is non-additive is not the same as being protected from adding it wrong. There are three places the protection does not reach. None of them require the system to be buggy. Each one is the system working exactly as documented.

5.1 A metric view over a pre-aggregated source

This is the cleanest one, and you can run it yourself in the next section.

Suppose an upstream table already stores a daily active-user count: daily_active_users(activity_date, active_users), where active_users was computed as COUNT(DISTINCT user_id) per day. Now you build a perfectly reasonable metric view on top with the measure SUM(active_users), and you call it monthly active users.

Query it at the monthly grain and it inflates. It sums the daily distinct counts, so any user who was active on more than one day in the month gets counted more than once. The measure is correctly defined. It even returns the right number at the daily grain. It only goes wrong at coarser grains, which is the worst kind of wrong, because it demos fine and breaks in production.

The metric view cannot catch this, and it is not Databricks' fault. The measure expression is SUM(active_users), which is additive. The non-additivity is hidden one layer up, in the transformation that turned raw events into a daily distinct count. The catalog entry has no record of that. Additivity here is a property of the upstream code, and the catalog only sees the column.

5.2 Materialization and the BI rewrite

The query-time path is safe for distinct counts on raw data. The rewrite paths are where Databricks itself warns about wrong numbers.

To be precise, because this is a place it is easy to overstate: the materialization documentation handles non-additive measures safely. It marks them as not eligible for rollup and either requires an exact match or falls back to the source, so it avoids the wrong answer rather than serving it.

The explicit warning is on the BI compatibility mode page. In Databricks' own words, "Count (Distinct), Standard Deviation, Variance, and Median generate query patterns that are incompatible with the rewrite mechanism and produce errors or incorrect results." The same guidance flags grand totals: when a BI tool reaggregates per-group values on the client side, a non-additive measure such as a ratio over COUNT(DISTINCT) "might display incorrect values." This is the vendor describing, in its own docs, the conditions under which a governed metric returns a wrong number.

5.3 Semi-additive is a field a human fills in

Some measures are semi-additive: you can add them across some dimensions but not others. An account balance can be summed across accounts but not across days. Databricks lets you handle this, and dbt MetricFlow has its own field for it, non_additive_dimension.

The point is who fills these in. They are hand-declared. A modeler has to know a measure is semi-additive and remember to mark it. Nothing reads the transformation and concludes it. dbt's own semantic layer, which lives right next to the transformation code, still makes non_additive_dimension a manually specified field that sits empty until a human types into it. Proximity to the code is not the same as reading the code.

6. See it yourself: a metric view that inflates

This is short, runnable, and self-contained. It needs only a Databricks workspace with metric views enabled. It builds the same number two ways: once over a pre-aggregated source (wrong) and once over raw events (right). The SQL logic is the point here and it is verified; the exact metric-view keys and spec version can vary by Databricks Runtime, so if your workspace rejects a line, adjust the key or version and the result holds.

sql
-- Setup: 3 users, 2 months, some users active on more than one day
CREATE TABLE raw_events (event_date DATE, user_id INT);
INSERT INTO raw_events VALUES
  ('2024-01-01', 1), ('2024-01-15', 1),  -- user 1 active twice in January
  ('2024-01-15', 2),
  ('2024-02-02', 1),
  ('2024-02-10', 3), ('2024-02-20', 3);  -- user 3 active twice in February

-- A pre-aggregated daily table: active_users is already a per-day COUNT(DISTINCT)
CREATE TABLE daily_active_users AS
SELECT event_date AS activity_date, COUNT(DISTINCT user_id) AS active_users
FROM raw_events GROUP BY event_date;
sql
-- WRONG: a metric view over the pre-aggregated daily table
CREATE OR REPLACE VIEW mau_wrong WITH METRICS LANGUAGE YAML AS $$
version: 1.1
source: daily_active_users
dimensions:
  - name: month
    expr: DATE_TRUNC('MONTH', activity_date)
measures:
  - name: monthly_active_users
    expr: SUM(active_users)        # additive-looking, but it sums daily DISTINCT counts
$$;

SELECT month, MEASURE(monthly_active_users) AS mau
FROM mau_wrong GROUP BY month ORDER BY month;
-- January: 3, February: 3   <- WRONG
sql
-- CORRECT: a metric view over the raw events table
CREATE OR REPLACE VIEW mau_correct WITH METRICS LANGUAGE YAML AS $$
version: 1.1
source: raw_events
dimensions:
  - name: month
    expr: DATE_TRUNC('MONTH', event_date)
measures:
  - name: monthly_active_users
    expr: COUNT(DISTINCT user_id)  # recomputed from raw rows at query time
$$;

SELECT month, MEASURE(monthly_active_users) AS mau
FROM mau_correct GROUP BY month ORDER BY month;
-- January: 2, February: 2   <- CORRECT

The true monthly active users are two in January (users 1 and 2) and two in February (users 1 and 3). The pre-aggregated view reports three and three, because it double-counts user 1 in January and user 3 in February. Same business question, two governed metric views, and only one is right. The catalog created and served both without complaint, because both are valid YAML and valid SQL. Validity is not correctness.

7. Declared vs derived: what actually makes additivity safe

Look back at what went wrong. In every case the catalog held a label, and the label was either absent, hand-typed, or simply blind to a property that lived in the code one layer up. That is the pattern: a governed metric definition can label additivity, but it cannot derive it, and for the buried cases it cannot even see it. The honest version of the claim is not "the catalog is useless." It is narrower and more uncomfortable: knowing a measure's additivity is a property of the transformation that built the measure, not of the catalog entry that points at it. And on raw base data, where the aggregate is right there in the measure expression, the catalog gets it right on its own. The gap is the cases above.

7.1 Reading additivity instead of declaring it: what ships today

The alternative to a hand-filled field is to read the code. Take the measure that breaks exactly where Databricks said it would. Section 5.2 quoted the grand-totals warning: a ratio like SUM(revenue) / COUNT(DISTINCT customer_id) displays wrong values when a tool reaggregates it across groups. In Unity Catalog or in dbt, nothing stops you from shipping that measure, and whether anyone notices depends on a human spotting the problem and filling in a field.

yaml
measures:
  - name: revenue_per_customer
    expr: SUM(revenue) / COUNT(DISTINCT customer_id)

Typedef reads that expression and classifies the measure as non-additive on its own, because the COUNT(DISTINCT) is right there in the code. The same goes for an AVG(...), a MEDIAN(...), or any aggregate whose math does not survive re-aggregation. There is no non_additive_dimension field to remember and no human in the loop. This is shipped today, and it covers the common case: when the non-additive pattern is visible in the measure expression, the classifier catches it instead of waiting for someone to declare it. That is the difference between a label you hope got filled in and a property read straight off the code.

The catalog and dbt can do the other thing: they can hold the label once a human supplies it. They cannot read the expression and conclude it.

7.2 The harder cases need lineage, and that part is prototype

Be clear about the limit, because Section 6's repro is exactly the hard case. When the non-additivity is buried upstream, as it is in SUM(active_users) where active_users was a distinct count, reading the measure expression is not enough. Catching it means following the lineage of active_users back to the COUNT(DISTINCT) that produced it. That upstream-provenance walk is a prototype in our auditor today, not a shipped capability. We have not shipped a button that catches the pre-aggregated case end to end.

What survives is the principle, and it survives precisely because it is honest about the boundary: additivity is derivable by reading the code that built the metric, at the expression level now and at the lineage level as that work lands. A catalog entry cannot derive it at either level, because the catalog never sees the code. Declared is what a catalog can do. Derived is what reading the transformation can do.

7.3 A compiler in the loop for data agents

This matters more as the consumer changes. A human building one dashboard might know not to sum daily active users. An AI agent querying a governed metric view does not. It will group by whatever grain the question implies, including the monthly rollup that nobody validated, and it will return the inflated number with full confidence and a governed source to cite. Define-once makes the wrong number consistent and reusable everywhere, which is worse than a one-off mistake, not better.

The fix is not a better label. It is a derivation of the property from the code, plus a check that runs before the answer ships. That is what a compiler in the loop adds and a catalog cannot: Typedef is building that compiler for data agents. Metric views are a good place to define a metric. They are not, on their own, the place that tells you the metric is safe to add.

8. FAQ

Do Unity Catalog metric views double-count distinct measures? Not on raw base data. A COUNT(DISTINCT) measure defined over raw rows is recomputed at query time, so it returns the correct distinct count at any grain. The double-counting shows up in three specific cases: a metric view built on a source that already collapsed the grain (the Section 6 repro), the BI compatibility rewrite and client-side grand totals that Databricks' own docs warn about, and semi-additive measures that depend on a human filling in a field. The measure being "governed" does not change any of this.

I model everything in dbt. Should I use dbt MetricFlow or Unity Catalog metric views, and do I need both? If you need definitions that work across warehouses, MetricFlow is the portable option. If you are Databricks-centric and want the deepest integration, including Genie reading your metrics, Unity Catalog metric views give you that. Plenty of teams will run both: dbt builds and governs the tables, and metric views expose governed aggregations on top. Neither one derives additivity for you. Both make it a field you declare.

Does Genie use metric views, and did you test it? Genie reads metric views as governed context, per Databricks. We did not test Genie, and we did not test any private-preview capability. Nothing in this post is based on probing Databricks internals. The repro in Section 6 is something you run on your own Databricks workspace, and the receipts are Databricks' own public documentation.

What is time-additivity? Whether a measure can be correctly summed across time. Revenue is additive: monthly revenue is the sum of daily revenue. A distinct user count is not: monthly active users is not the sum of daily active users, because the same user shows up on multiple days. Semi-additive measures sit in between, summable across some dimensions but not time.