What Is Genie Ontology? Databricks' Continuously Learned Context Layer Explained

Genie Ontology is the context layer Databricks announced this summer for its AI agents. It is a continuously learned graph that ranks the most authoritative definition of your business. Ranking the most trusted definition is still not the same as checking the number an agent computes from it, for three reasons. The context the graph builds can be incomplete. The ranking can put a popular definition on top that is wrong for the question. And whether a calculation is valid lives in the transformation code, which a popularity graph never reads.

1. What Genie Ontology is, in one paragraph

At its Data and AI Summit in June 2026, Databricks announced Genie Ontology. It is a context layer for AI agents that run on Databricks. It builds a graph of how your company works, learned from your Databricks data and from more than 50 connected applications, and it keeps that graph up to date as the data changes. When an agent answers a question, Genie Ontology gives it the business meaning behind the data and ranks the most authoritative definition to use. Databricks describes the ranking as an approach similar to PageRank, and at the keynote its team called it OntoRank. Genie Ontology is in preview. The agent it powers, Genie One, is generally available, along with Genie Agents and Genie Code. This post explains what Genie Ontology does and credits the part it gets right. It then explains the one problem that better context cannot fix. Ranking the most trusted definition of a metric is not the same as checking whether the number an agent computes from it is correct.

2. What Genie Ontology actually does

2.1 The sources it learns from

Genie Ontology does not wait for someone to write its knowledge by hand. It reads the sources you already have and builds the graph itself. Those sources include your Databricks tables, the queries people run, the dashboards they build, and the pipelines that produce the tables. They also include more than 50 connected applications, e.g., Slack, Jira, Google Drive, SharePoint, and Salesforce. From all of this it pulls out pieces of knowledge, such as what a term means, how a metric is defined, and which tables relate to which. It organizes those pieces into a graph of how the company works and what its data means. Databricks calls this a self-improving knowledge graph, which means it updates as the underlying sources change rather than staying fixed.

2.2 OntoRank: ranking the most authoritative definition

The hard part of business context is that the same word can have several definitions. Finance and sales may each define revenue in their own way. Two teams may count active users with different rules. An agent that picks the wrong definition gives a confident wrong answer. Genie Ontology works on this problem by ranking the definitions it finds, so the most trusted one comes first.

Databricks describes the ranking as an approach similar to PageRank. PageRank ordered web pages by how other pages linked to them. Genie Ontology orders definitions by how trustworthy they look across your company. Databricks says the ranking weighs five things:

Where a definition came from.
The authority of the person or source that created it.
How often people rely on it.
How closely it ties to certified and widely used data assets.
How fresh it is.

It also checks permissions, so an agent only sees the definitions the current user is allowed to see. When several definitions exist, Genie Ontology puts forward the one with the strongest trust signals.

2.3 How Unity Catalog metric views feed it, as one governed input among many

One of the sources Genie Ontology learns from is your governed semantic layer in Unity Catalog. Databricks ships a feature called Unity Catalog Business Semantics, which includes metric views. A metric view is a governed definition of a metric, such as revenue or active users, that you define once and query from SQL, dashboards, and agents. Databricks states that this user-defined semantic foundation in Unity Catalog feeds the Genie Ontology.

Metric views are one curated input. They are not the whole graph. Genie Ontology also learns from the tables, queries, dashboards, pipelines, and the connected applications above. So the metric view is where your governed definitions live, and Genie reads it as one trusted input among many. We cover that metric layer on its own in What Are Metrics in Unity Catalog?.

3. The part Databricks gets right: an ontology is more than RAG

Most context layers do one thing. They search your text for passages that look like the question and hand the top matches to the agent. This is retrieval, and on its own it cannot tell the difference between two definitions of revenue that both look relevant.

Genie Ontology does more than retrieval, and this is the part to credit plainly. It builds a graph and ranks definitions by how trusted they are, not only by how well their text matches the question. Working out which of several definitions is the authoritative one is a problem most layers ignore. The permission check is also useful, because it keeps an agent from answering with data the user is not allowed to see.

Databricks supports the claim with its own benchmark. It reports that Genie answered 84.5% of real-world questions correctly on the first try, compared with 52.4% for the strongest general-purpose coding agent. This is Databricks' own internal benchmark, run on its own platform, so read it as the vendor's number rather than an independent test. Even read that way, the gain is large, and ranked context clearly helps an agent answer hard questions. We should say so before we draw the line.

4. Why an ontology is not a verification: three reasons an answer is still wrong

Now look at what the benchmark leaves out. If 84.5% of answers are correct, then about one in six is still wrong. The agent does not mark these answers as uncertain, and it does not refuse them. It returns them in the same confident voice as the answers that are right, and it can cite a governed source for them.

There are three reasons an answer is still wrong even with Genie Ontology on, and they add up.

4.1 The context the ontology builds can be incomplete

The graph holds definitions and the relationships between them. It does not hold every rule that decides whether a calculation on a definition is valid. A metric can have a clear, agreed definition in the graph, and the graph can still be missing the constraint you need to use it correctly. The definition can be present and right, and the rule that protects it can be absent.

4.2 OntoRank can rank or surface the wrong definition

This reason lives inside Genie Ontology's own mechanism. Ranking by authority and popularity means the definition that most people use, or that ties to the most certified assets, rises to the top. The most popular definition is not always the right one for the question being asked. A ranking built from trust signals measures how trusted a definition is, not whether it fits this specific question. So the top-ranked definition can be a popular one that does not answer what the user actually asked, and a correct but less used definition can be ranked below it. A better ranking reduces how often this happens. It does not remove it.

4.3 Ranking the right definition is not checking the calculation

This is the hardest reason. Suppose Genie Ontology surfaces the correct, complete definition of a metric. The agent can still compute the wrong number, because the metric gets combined across time, or across levels of detail, in a way that is not valid. A definition tells the agent what a metric means. It does not tell the agent what math is safe to do with it. Whether a calculation is valid depends on how the metric was built in the transformation code that produced the data. Genie Ontology ranks meaning and serves it. It does not run the agent's query against the code that built the data, and it does not check the result.

4.4 Two graphs that answer different questions

Genie Ontology is built from signals of authority and popularity, such as who created a definition and how widely people use it. That tells you which definition the company trusts most. It does not tell you which calculations on that definition are valid.

The thing that tells you which calculations are valid is a different graph. That graph is built from the transformation code that produced the data, and it carries the rule for what math each metric allows. Genie's graph answers which definition is most trusted. The second graph answers whether a calculation is valid. Genie Ontology has the first graph. Verification needs the second.

5. A wrong answer no authority ranking can fix

First, a disclosure. Genie Ontology is in preview. We have not tested it, and we make no claim about it. The example below is plain SQL that you can run on any engine. It shows the kind of error that any context layer leaves unfixed, whether the layer ranks definitions or not.

A company stores a daily count of active users. Each row holds a date and the number of distinct users active that day, computed upstream with a COUNT(DISTINCT user_id). Someone then defines monthly active users as the sum of the daily counts. That definition can be the most authoritative one in the company. It can be certified, widely used, and ranked first by OntoRank. It is still wrong to add daily distinct counts across a month, because a user who was active on more than one day is counted on each of those days.

Here is the setup and the two ways to compute the monthly number.

sql
-- Three users, two months. Some users are active on more than one day.
CREATE TABLE raw_events (event_date DATE, user_id INT);
INSERT INTO raw_events VALUES
  ('2024-01-01', 1), ('2024-01-15', 1),  -- user 1 active twice in January
  ('2024-01-15', 2),
  ('2024-02-02', 1),
  ('2024-02-10', 3), ('2024-02-20', 3);  -- user 3 active twice in February

-- The upstream daily table: active_users is already a per-day COUNT(DISTINCT)
CREATE TABLE daily_active_users AS
SELECT event_date AS activity_date, COUNT(DISTINCT user_id) AS active_users
FROM raw_events
GROUP BY event_date;

sql
-- WRONG: sum the daily distinct counts across the month
SELECT date_trunc('month', activity_date) AS month, SUM(active_users) AS mau
FROM daily_active_users
GROUP BY 1 ORDER BY 1;
-- January: 3, February: 3

sql
-- CORRECT: count distinct users over the whole month from raw events
SELECT date_trunc('month', event_date) AS month, COUNT(DISTINCT user_id) AS mau
FROM raw_events
GROUP BY 1 ORDER BY 1;
-- January: 2, February: 2

Running both queries returns this:

text
WRONG, the sum of daily distinct counts:
  month         mau
  2024-01-01      3
  2024-02-01      3

CORRECT, distinct users counted over the whole month:
  month         mau
  2024-01-01      2
  2024-02-01      2

The true monthly active users are two in January, users 1 and 2, and two in February, users 1 and 3. The sum reports three in each month, because it counts user 1 twice in January and user 3 twice in February. This output is verified on DuckDB v1.2.2. The error is in the transformation rather than the engine, so the same numbers come back on any SQL engine.

Now notice where the fact that breaks the answer lives. It is not in the question, which only asked for monthly active users. It is not in the definition, which is a plain sum and looks safe to add. It is not in the two tables, because one stores raw events and the other stores a daily count, and neither shows that adding the daily counts double counts people. The COUNT(DISTINCT) that makes the sum invalid is a step in the transformation code between raw events and the daily table. You can tell the answer is wrong only from that transformation.

This is the error no authority ranking can fix. On Databricks, that same sum is what you would write as a Unity Catalog metric view, which is the governed object Genie reads as context:

sql
-- The pre-aggregated SUM written as a Unity Catalog metric view.
-- This is the Databricks form of the WRONG query above. It is shown for
-- illustration and was not run on Databricks here. The verified numbers
-- come from the plain SQL above.
CREATE OR REPLACE VIEW mau_wrong WITH METRICS LANGUAGE YAML AS $$
version: 1.1
source: daily_active_users
dimensions:
  - name: month
    expr: DATE_TRUNC('MONTH', activity_date)
measures:
  - name: monthly_active_users
    expr: SUM(active_users)   # additive-looking, but it sums daily DISTINCT counts
$$;

SELECT month, MEASURE(monthly_active_users) AS mau
FROM mau_wrong
GROUP BY month ORDER BY month;
-- Returns the same 3 and 3 as the WRONG query above.

For the metric view syntax, see our explainer What Are Metrics in Unity Catalog?. The point here is what Genie does with this view. Genie Ontology can rank it as the most authoritative definition of monthly active users and serve it with full confidence, and the monthly number is still too high.

6. Two graphs, two questions: observed authority and derived validity

The difference between Genie Ontology and a check on the answer comes down to two graphs that answer different questions.

Genie's graph is built from observed authority and popularity. It answers which definition the company trusts most. The graph that catches the error in Section 5 is built from the transformation code, and it answers whether a calculation is valid. The first graph is about people and usage. The second graph is about the math the code makes legal.

A fair objection comes up here. Genie Ontology links definitions to certified assets, and Databricks builds lineage that records which columns feed which. Does that not already cover validity? It does not. Certification and authority links record who trusts a definition. Lineage records that one column feeds another. Neither of them carries the rule that a distinct count cannot be summed across time. Having the link, and having the lineage, is not the same as catching the error. The rule that protects the calculation comes from reading the transformation code itself, not from who trusts the definition or which column feeds which.

7. A compiler in the loop for data agents

The only thing that turns a good guess into a checked answer is to verify the answer against the code that built the data. In practice this means working out the properties of each metric from how it was built, and checking the answer against those properties before it goes out.

Some of this ships today. When the unsafe pattern is visible in the measure expression, for example a COUNT(DISTINCT), an AVG, a MEDIAN, or a ratio that divides by a distinct count, Typedef reads the expression and marks the measure as not safe to add on its own. There is no field for a person to fill in, and no person in the loop. That part is shipped.

Some of it is still being built. In the Section 5 example the problem is buried one layer up, in the transformation that turned raw events into a daily distinct count. Catching that case means following the metric's lineage back to the COUNT(DISTINCT) that produced the daily column. That upstream walk is a prototype in our auditor today, not a shipped button. We have not shipped a one-click catch for the buried case.

What holds in both cases is the principle. Whether a calculation is valid can be derived by reading the code that built the metric, at the level of the expression now and at the level of lineage as that work lands. A catalog entry or an ontology cannot derive it at either level, because neither one reads the transformation code. A label is something a person declares. A property is something you read from the code.

This is a bigger problem as the reader of the metric changes. A person building one dashboard might know not to sum daily active users. An agent querying a governed metric does not. It will group by whatever level of detail the question implies, including the monthly rollup that nobody checked, and it will return the inflated number with full confidence and a governed source to cite. Defining a metric once makes the wrong number consistent everywhere it is used, which is worse than a single mistake.

This is what we build at Typedef. Typedef is the compiler for data agents. It derives the properties of your metrics from the transformation code and checks an answer against those properties before it ships. You can read more without booking a demo at Typedef.

8. FAQ

Is Genie Ontology available yet, GA or preview? Genie Ontology is in preview as of June 2026. Genie One, Genie Agents, and Genie Code, the agent products it supports, are generally available, announced on June 16, 2026. The 84.5% accuracy figure describes the preview, and Genie Ontology works only with Databricks.

How is Genie Ontology different from Cortex Sense? Both are governed context layers for AI agents, and both were announced in the same season. Each one works with a single vendor. Genie Ontology works with Databricks, and Cortex Sense works with Snowflake. Genie Ontology builds a graph and ranks definitions by trust. Cortex Sense retrieves definitions when a query arrives and reorders them by relevance. The deeper question is whether either one checks its answers or only adds context. There is a full Snowflake-side explainer in What Is Cortex Sense?.

Does Genie Ontology verify an agent's answer? No. It ranks authoritative context and serves it to the agent. It does not run the agent's query against the code that built the data, and it does not check whether the result is correct. Databricks positions it as a context layer, and CIO's coverage of the launch quoted an analyst making the same point, that better context improves answers but does not guarantee they are correct.

How is Genie Ontology different from Typedef's typed graph? Genie's graph is built from signals of authority and popularity, such as who created a definition and how widely it is used, and it ranks the most trusted one. Typedef's graph is built from the transformation code, and it derives what math is valid on each metric. Genie's graph answers which definition is most trusted. Typedef's graph answers whether a calculation is valid.

Did Typedef test Genie? No. Genie Ontology is in preview, we make no claim about it, and nothing in this post is based on probing Databricks internals. The example in Section 5 is plain SQL you run yourself, and the supporting facts are Databricks' own public documentation.

Sources

Databricks, Introducing Genie One, Genie Agents, and Genie Ontology (blog, the announcement and the PageRank-style ranking)
Databricks, What's new with Unity Catalog at Data + AI Summit 2026 (blog, Unity Catalog semantics feeding the ontology)
Databricks, Databricks Launches Genie One (press release, the GA status of Genie One)
CIO, From RAG to ontology: Databricks bets on context as the key to trusted AI agents (analysis, the OntoRank signals and the context-is-not-correctness caveat)
ITdaily, Not pagerank, but ontorank (analysis, the OntoRank name and the 50-plus sources)