How to Migrate from GoodData to Databricks Unity Catalog Metric Views

Moving from GoodData LDM to Databricks Metric Views requires translating MAQL to Spark SQL, rebuilding dataset relationships as join specifications, and implementing metrics in Unity Catalog. GoodData's proprietary YAML/MAQL definitions become Databricks-native YAML specifications executed through Spark SQL. This guide provides the technical steps for migration while maintaining governed metric definitions and data consistency.

Executive Summary: Migration Approach

The Core Translation

GoodData structures metrics through:

LDM (Logical Data Model): Datasets with facts, attributes, and relationships
MAQL: Proprietary query language for reusable metrics
Workspaces: Tenant-level isolation for multi-tenancy

Databricks structures metrics through:

Unity Catalog objects: First-class metric views alongside tables
Spark SQL expressions: Native Spark SQL for metric calculations
MEASURE clause: SQL syntax for querying metrics

Migration Path

The migration follows this sequence:

Map GoodData datasets to Delta Lake tables
Translate relationship definitions to join specifications
Convert MAQL metrics to Spark SQL expressions
Register metric views in Unity Catalog
Implement governance through Unity Catalog RBAC

Key Architectural Shift

Component	GoodData	Databricks
Metric language	MAQL (proprietary)	Spark SQL (standard)
Execution	MAQL interpreter → vendor SQL	Native Spark SQL + Photon
Query syntax	GoodData APIs	`MEASURE()` clause
Storage	YAML or API-managed	Unity Catalog objects
Isolation	Workspaces	Schemas + RBAC

Step 1: Map GoodData LDM to Delta Lake Schema

Export LDM structure

Extract your complete LDM configuration from GoodData. The LDM contains datasets (boxes), relationships (arrows), and metric definitions written in MAQL.

Each dataset contains:

Facts: Numeric columns for aggregation (order_amount, quantity)
Attributes: Categorical dimensions (region, category, status)
Primary keys: Identify unique records
Foreign keys: Link to other datasets

Relationships define join paths:

1:N arrows: One parent record to many child records
M:N arrows: Many-to-many through bridge tables

Create Delta Lake tables

Map each GoodData dataset to a Delta Lake table:

Fact datasets → Fact tables in Delta Lake:

sql
CREATE TABLE semantic.fact_orders (
  order_id STRING NOT NULL,
  customer_id STRING NOT NULL,
  product_id STRING NOT NULL,
  order_date DATE NOT NULL,
  order_amount DECIMAL(18,2),
  quantity INT,
  CONSTRAINT pk_orders PRIMARY KEY (order_id)
) USING DELTA
LOCATION 'dbfs:/semantic/fact_orders';

Dimension datasets → Dimension tables:

sql
CREATE TABLE semantic.dim_customers (
  customer_id STRING NOT NULL,
  customer_name STRING,
  region STRING,
  tier STRING,
  CONSTRAINT pk_customers PRIMARY KEY (customer_id)
) USING DELTA
LOCATION 'dbfs:/semantic/dim_customers';

Date datasets → Date dimension:

sql
CREATE TABLE semantic.dim_dates (
  date_key DATE NOT NULL,
  year INT,
  quarter INT,
  month INT,
  day INT,
  day_of_week INT,
  week_of_year INT,
  month_start_date DATE,
  quarter_start_date DATE,
  CONSTRAINT pk_dates PRIMARY KEY (date_key)
) USING DELTA
LOCATION 'dbfs:/semantic/dim_dates';

Translate relationships to join specifications

GoodData relationship arrows become explicit join definitions in metric view YAML:

GoodData LDM:

Orders → Customers (via customer_id)
Orders → Products (via product_id)
Orders → Dates (via order_date)

Databricks YAML:

yaml
joins:
  - table: semantic.dim_customers
    on: fact_orders.customer_id = dim_customers.customer_id

  - table: semantic.dim_products
    on: fact_orders.product_id = dim_products.product_id

  - table: semantic.dim_dates
    on: fact_orders.order_date = dim_dates.date_key

Multi-hop joins (Orders → Stores → Regions):

yaml
joins:
  - table: semantic.dim_stores
    on: fact_orders.store_id = dim_stores.store_id

  - table: semantic.dim_regions
    on: dim_stores.region_id = dim_regions.region_id

Step 2: Translate MAQL Metrics to Spark SQL

Simple aggregations

GoodData MAQL:

SELECT SUM({fact/order_amount})

Databricks YAML:

yaml
measures:
  - name: total_revenue
    expr: SUM(order_amount)
    description: "Sum of all order amounts"

GoodData MAQL with filter:

SELECT SUM({fact/order_amount}) WHERE {label/order_status} = "Completed"

Databricks YAML:

yaml
measures:
  - name: completed_revenue
    expr: SUM(CASE WHEN order_status = 'Completed' THEN order_amount ELSE 0 END)
    description: "Revenue from completed orders only"

Ratio metrics

MAQL ratio metrics require translation to preserve correct aggregation semantics:

GoodData MAQL:

SELECT SUM({fact/order_amount}) / COUNT({fact/order_id}, ALL)

Databricks YAML:

yaml
measures:
  - name: avg_order_value
    expr: SUM(order_amount) / COUNT(DISTINCT order_id)
    description: "Average revenue per order"

The Databricks expression computes the ratio after aggregation, ensuring correct results regardless of grouping dimensions.

Derived metrics

Metrics referencing other metrics require restructuring:

GoodData MAQL:

SELECT {metric/revenue} - {metric/cost}

Databricks YAML:

yaml
measures:
  - name: revenue
    expr: SUM(order_amount)

  - name: cost
    expr: SUM(unit_cost * quantity)

  - name: profit
    expr: revenue - cost
    description: "Profit computed from revenue minus cost"

Databricks resolves measure references automatically, computing base measures before applying derived calculations.

MAQL BY clause translation

MAQL's BY clause changes aggregation context. Translate to Spark SQL window clauses:

GoodData MAQL:

SELECT SUM({fact/amount}) BY {label/region}, ALL {label/state}

Databricks YAML:

yaml
measures:
  - name: regional_total
    expr: SUM(SUM(amount)) OVER (PARTITION BY region)
    description: "Total aggregated to region level"

MAQL FOR clause translation

GoodData's FOR clause applies temporary filters. Use CASE statements:

GoodData MAQL:

SELECT {metric/revenue} FOR {label/region} = "West"

Databricks YAML:

yaml
measures:
  - name: west_region_revenue
    expr: SUM(CASE WHEN region = 'West' THEN order_amount ELSE 0 END)
    description: "Revenue for West region only"

MAQL ALL operator

The ALL operator in MAQL ignores certain dimensions. Translate to subqueries or window clauses:

GoodData MAQL:

SELECT SUM({fact/amount}) / SUM({fact/amount}) BY ALL {label/category}

Databricks alternative:

yaml
measures:
  - name: category_pct_of_total
    expr: SUM(amount) / SUM(SUM(amount)) OVER ()
    description: "Category amount as percentage of total"

Step 3: Create Metric Views in Unity Catalog

Define metric view YAML

Structure complete metric view definitions in YAML:

yaml
metric_view:
  name: sales_metrics
  description: "Core sales and revenue metrics"
  source_table: semantic.fact_orders

  joins:
    - table: semantic.dim_customers
      on: fact_orders.customer_id = dim_customers.customer_id

    - table: semantic.dim_products
      on: fact_orders.product_id = dim_products.product_id

    - table: semantic.dim_dates
      on: fact_orders.order_date = dim_dates.date_key

  measures:
    - name: total_revenue
      expr: SUM(order_amount)
      description: "Sum of all order amounts"

    - name: order_count
      expr: COUNT(DISTINCT order_id)
      description: "Count of distinct orders"

    - name: avg_order_value
      expr: SUM(order_amount) / COUNT(DISTINCT order_id)
      description: "Average revenue per order"

    - name: unique_customers
      expr: COUNT(DISTINCT customer_id)
      description: "Count of distinct customers"

  dimensions:
    - name: customer_region
      expr: dim_customers.region
      description: "Customer geographic region"

    - name: product_category
      expr: dim_products.category
      description: "Product category"

    - name: order_date
      expr: DATE(order_timestamp)
      description: "Date of order placement"

    - name: order_month
      expr: DATE_TRUNC('month', order_timestamp)
      description: "Month of order placement"

Register metric view

Upload YAML and register in Unity Catalog:

sql
-- Upload to volume
PUT 'file:///local/path/sales_metrics.yaml'
INTO '/Volumes/semantic/definitions/sales_metrics.yaml';

-- Create metric view from YAML
CREATE METRIC VIEW semantic.sales_metrics
FROM '/Volumes/semantic/definitions/sales_metrics.yaml';

Alternative inline SQL DDL:

sql
CREATE METRIC VIEW semantic.sales_metrics AS
SELECT
  SUM(fo.order_amount) AS total_revenue,
  COUNT(DISTINCT fo.order_id) AS order_count,
  SUM(fo.order_amount) / COUNT(DISTINCT fo.order_id) AS avg_order_value,
  dc.region AS customer_region,
  dp.category AS product_category,
  DATE(fo.order_timestamp) AS order_date
FROM semantic.fact_orders fo
JOIN semantic.dim_customers dc
  ON fo.customer_id = dc.customer_id
JOIN semantic.dim_products dp
  ON fo.product_id = dp.product_id;

Configure permissions

Implement Unity Catalog RBAC:

sql
-- Grant SELECT on metric view
GRANT SELECT ON METRIC VIEW semantic.sales_metrics
TO `analysts`;

-- Grant MODIFY for maintainers
GRANT MODIFY ON METRIC VIEW semantic.sales_metrics
TO `data_engineers`;

-- Users only need SELECT on metric view, not underlying tables

Step 4: Query Metric Views

Basic queries

Query metrics using the MEASURE() clause:

sql
SELECT
  customer_region,
  MEASURE(total_revenue) AS revenue,
  MEASURE(order_count) AS orders
FROM semantic.sales_metrics
GROUP BY customer_region;

Filtered queries

Apply standard SQL WHERE clauses:

sql
SELECT
  product_category,
  MEASURE(total_revenue) AS revenue
FROM semantic.sales_metrics
WHERE order_date >= '2025-01-01'
  AND customer_region = 'West'
GROUP BY product_category;

Multi-dimensional analysis

Slice metrics across multiple dimensions:

sql
SELECT
  order_month,
  customer_region,
  product_category,
  MEASURE(total_revenue) AS revenue,
  MEASURE(avg_order_value) AS aov
FROM semantic.sales_metrics
WHERE order_date >= '2025-01-01'
GROUP BY order_month, customer_region, product_category
ORDER BY order_month, revenue DESC;

Time-series analysis

Leverage date dimensions for time-based queries:

sql
SELECT
  order_month,
  MEASURE(total_revenue) AS revenue,
  LAG(MEASURE(total_revenue)) OVER (ORDER BY order_month) AS prev_month_revenue,
  (MEASURE(total_revenue) - LAG(MEASURE(total_revenue)) OVER (ORDER BY order_month)) /
    LAG(MEASURE(total_revenue)) OVER (ORDER BY order_month) * 100 AS growth_pct
FROM semantic.sales_metrics
WHERE order_date >= '2024-01-01'
GROUP BY order_month
ORDER BY order_month;

Step 5: Implement Governance

Certification

Mark production-ready metrics as certified:

sql
ALTER METRIC VIEW semantic.sales_metrics
SET TBLPROPERTIES (
  'certified' = 'true',
  'owner' = 'finance_team',
  'certification_date' = '2025-01-15'
);

Certified metrics display trust badges in Unity Catalog UI.

Tagging

Apply organizational tags:

sql
ALTER METRIC VIEW semantic.sales_metrics
SET TAGS (
  'domain' = 'finance',
  'pii' = 'false',
  'update_frequency' = 'daily',
  'data_source' = 'erp'
);

Documentation

Document metrics with business context:

yaml
measures:
  - name: total_revenue
    expr: SUM(order_amount)
    description: |
      Total order revenue before discounts and refunds.

      Excludes cancelled and pending orders.
      Source: ERP system via nightly batch load.
      Update frequency: Daily at 2 AM UTC.
      Historical data: Available from 2020-01-01.

      Business owner: finance@company.com
      Technical owner: data-eng@company.com

Lineage

Unity Catalog tracks lineage automatically:

sql
-- View lineage
SELECT * FROM system.access.table_lineage
WHERE target_name = 'semantic.sales_metrics';

Step 6: Handle Multi-Tenancy Patterns

GoodData workspaces provide tenant isolation. Implement equivalent patterns in Databricks:

Approach 1: Separate schemas

Create tenant-specific schemas:

sql
-- Tenant A
CREATE SCHEMA semantic.tenant_a;
CREATE METRIC VIEW semantic.tenant_a.sales_metrics
FROM '/Volumes/semantic/definitions/tenant_a_sales.yaml';

-- Tenant B
CREATE SCHEMA semantic.tenant_b;
CREATE METRIC VIEW semantic.tenant_b.sales_metrics
FROM '/Volumes/semantic/definitions/tenant_b_sales.yaml';

Approach 2: Row-level security

Single metric view with tenant filtering:

sql
CREATE METRIC VIEW semantic.multi_tenant_metrics AS
SELECT
  SUM(order_amount) AS total_revenue,
  COUNT(DISTINCT order_id) AS order_count,
  tenant_id,
  customer_region
FROM semantic.fact_orders
WHERE tenant_id = current_user_tenant_id();

Approach 3: Separate metric views

Multiple metric view instances on shared data:

yaml
# tenant_a_metrics.yaml
metric_view:
  name: tenant_a_sales_metrics
  source_table: semantic.fact_orders
  filters:
    - tenant_id = 'tenant_a'
  measures:
    - name: total_revenue
      expr: SUM(order_amount)

Step 7: Validate Metric Outputs with Fenic

Configure Fenic session

Set up Fenic for validation pipelines:

python
import fenic as fc

session = fc.Session.get_or_create(
    fc.SessionConfig(
        app_name="metric_validation",
        semantic=fc.SemanticConfig(
            language_models={
                "flash": fc.GoogleDeveloperLanguageModel(
                    model_name="gemini-2.0-flash",
                    rpm=100,
                    tpm=1000
                )
            },
            default_language_model="flash"
        )
    )
)

Read source data exports

Load data exports for validation:

python
# Read source system data
source_data = session.read.csv("data/exports/orders.csv")

# Transform and prepare
prepared = source_data.select(
    fc.col("order_id"),
    fc.col("order_date").cast(fc.DateType()),
    fc.col("customer_id"),
    fc.col("order_amount").cast(fc.DoubleType()),
    fc.col("region")
).filter(
    fc.col("order_amount") > 0
)

Compare metric outputs

Validate Databricks metrics against expected values:

python
import fenic as fc
from fenic import Session, SessionConfig, SemanticConfig, GoogleDeveloperLanguageModel

# Configure Fenic session
session = Session.get_or_create(
    SessionConfig(
        app_name="metric_validation",
        semantic=SemanticConfig(
            language_models={
                "flash": GoogleDeveloperLanguageModel(
                    model_name="gemini-2.0-flash",
                    rpm=100,
                    tpm=1000
                )
            },
            default_language_model="flash"
        )
    )
)

# Read source data exports
source_data = session.read.csv("data/exports/orders.csv")

# Transform and prepare
prepared = source_data.select(
    fc.col("order_id"),
    fc.col("order_date").cast(fc.DateType),
    fc.col("customer_id"),
    fc.col("order_amount").cast(fc.DoubleType),
    fc.col("region")
).filter(
    fc.col("order_amount") > 0
)

# Read Databricks results (exported to CSV for validation)
databricks_metrics = session.read.csv(
    "data/databricks_output/revenue_by_region.csv"
)

# Read expected values
expected_metrics = session.read.csv(
    "data/expected/revenue_by_region.csv"
)

# Compare results with proper join syntax
comparison = databricks_metrics.join(
    expected_metrics,
    left_on=fc.col("customer_region"),
    right_on=fc.col("region"),
    how="inner"
).with_column(
    "variance",
    fc.col("revenue") - fc.col("expected_revenue")
).with_column(
    "abs_variance",
    fc.when(fc.col("variance") >= 0, fc.col("variance"))
      .otherwise(-fc.col("variance"))
).with_column(
    "variance_pct",
    (fc.col("abs_variance") / fc.col("expected_revenue") * 100)
).select(
    fc.col("customer_region"),
    fc.col("revenue").alias("actual_revenue"),
    fc.col("expected_revenue"),
    fc.col("variance"),
    fc.col("variance_pct")
)

# Filter significant variances
issues = comparison.filter(fc.col("variance_pct") > 0.1)
issues.show()

Semantic validation

Use Fenic's semantic operators to validate metric logic:

python
from pydantic import BaseModel, Field
from typing import List

class MetricValidation(BaseModel):
    metric_name: str = Field(description="Name of the metric")
    logic_correct: bool = Field(description="Whether the metric logic is correct")
    issues: List[str] = Field(description="List of any issues found")

# Extract and validate metric definitions (YAML files must be read as JSON or processed differently)
metric_docs = session.read.docs(
    "data/metric_definitions.json",  # Changed to JSON
    content_type="json"
)

validated = metric_docs.with_column(
    "validation",
    fc.semantic.extract(
        fc.col("content"),  # The column name for content from read.docs is "content"
        MetricValidation
    )
).unnest("validation")

# Review metrics with issues
problems = validated.filter(fc.col("logic_correct") == False)
problems.show()

Step 8: Optimize Performance

Enable Photon

Configure SQL warehouses with Photon acceleration:

sql
CREATE WAREHOUSE semantic_warehouse
WAREHOUSE_SIZE = 'MEDIUM'
AUTO_STOP = 15
ENABLE_PHOTON = TRUE;

Optimize table layout

Apply Z-ORDER clustering on commonly filtered columns:

sql
OPTIMIZE semantic.fact_orders
ZORDER BY (customer_id, order_date, product_id);

Maintain table statistics

Update statistics for query optimization:

sql
ANALYZE TABLE semantic.fact_orders COMPUTE STATISTICS;
ANALYZE TABLE semantic.dim_customers COMPUTE STATISTICS;
ANALYZE TABLE semantic.dim_products COMPUTE STATISTICS;

Materialized aggregations

Create materialized views for frequently queried aggregations:

sql
CREATE MATERIALIZED VIEW semantic.monthly_revenue AS
SELECT
  DATE_TRUNC('month', order_date) as month,
  customer_region,
  MEASURE(total_revenue) as revenue,
  MEASURE(order_count) as orders
FROM semantic.sales_metrics
GROUP BY month, customer_region;

Common Translation Challenges

Logical normalization

GoodData's attribute hierarchies (State → Region) must be explicitly modeled:

GoodData: Implicit hierarchy in LDM Databricks: Explicit dimension table

sql
CREATE TABLE semantic.dim_geography (
  state_code STRING,
  state_name STRING,
  region_code STRING,
  region_name STRING,
  PRIMARY KEY (state_code)
) USING DELTA;

Reference in metric view:

yaml
dimensions:
  - name: state
    expr: dim_geography.state_name
  - name: region
    expr: dim_geography.region_name

Context-free metric reusability

GoodData metrics automatically adapt to query dimensions. Databricks metrics work similarly but require explicit dimension definitions:

GoodData: Single metric works across all dimensions Databricks: Same behavior through MEASURE() clause

Both systems achieve context-free reusability. The difference is syntactic, not functional.

Workspace-level filtering

GoodData workspaces isolate data. Databricks requires explicit filtering:

yaml
metric_view:
  name: filtered_metrics
  source_table: semantic.fact_orders
  filters:
    - workspace_id = current_workspace()
  measures:
    - name: total_revenue
      expr: SUM(order_amount)

Or implement through views:

sql
CREATE VIEW semantic.workspace_a_orders AS
SELECT * FROM semantic.fact_orders
WHERE workspace_id = 'workspace_a';

Monitoring Metric Usage

Query audit logs

Track metric view queries:

sql
SELECT
  request_params.metric_view as metric_view,
  COUNT(*) as query_count,
  AVG(request_params.execution_time_ms) as avg_duration_ms,
  COUNT(DISTINCT user_identity.email) as unique_users
FROM system.access.audit
WHERE action_name = 'executeMeasure'
  AND request_params.metric_view IS NOT NULL
  AND event_date >= CURRENT_DATE() - INTERVAL 30 DAYS
GROUP BY metric_view
ORDER BY query_count DESC;

Performance metrics

Identify slow queries:

sql
SELECT
  request_params.metric_view,
  request_params.execution_time_ms,
  request_params.query_text,
  user_identity.email
FROM system.access.audit
WHERE action_name = 'executeMeasure'
  AND request_params.execution_time_ms > 10000
  AND event_date >= CURRENT_DATE() - INTERVAL 7 DAYS
ORDER BY request_params.execution_time_ms DESC;

Usage patterns

Analyze dimension combinations:

sql
SELECT
  request_params.dimensions_used,
  COUNT(*) as frequency
FROM system.access.audit
WHERE action_name = 'executeMeasure'
  AND event_date >= CURRENT_DATE() - INTERVAL 30 DAYS
GROUP BY dimensions_used
ORDER BY frequency DESC;

Migration Validation Checklist

Data preparation:

Export complete LDM structure
Document all dataset relationships
Extract all MAQL metric definitions
Map datasets to Delta Lake tables
Create star/snowflake schema

Metric translation:

Convert simple aggregations
Translate ratio metrics
Convert derived metrics
Handle MAQL BY clauses
Translate MAQL FOR clauses
Convert MAQL ALL operators

Implementation:

Create metric view YAML definitions
Register metric views in Unity Catalog
Configure RBAC permissions
Set up certification and tags
Document business context

Validation:

Compare metric outputs
Validate multi-dimensional queries
Test time-series calculations
Verify filtered metric results
Check derived metric accuracy

Performance:

Enable Photon acceleration
Apply Z-ORDER clustering
Update table statistics
Create materialized views for common queries
Benchmark query response times

Monitoring:

Set up query audit tracking
Monitor performance metrics
Track usage patterns
Identify optimization opportunities

Conclusion

Migrating from GoodData LDM to Databricks Unity Catalog Metric Views translates proprietary MAQL definitions to standard Spark SQL while maintaining metric consistency and governance. The lakehouse-native approach unifies metrics across BI, data science, and ML workloads through Unity Catalog's first-class metric objects.

The migration requires systematic translation of dataset relationships to join specifications, MAQL expressions to Spark SQL, and workspace isolation to Unity Catalog RBAC. Fenic provides reliable semantic operators for validation pipelines during migration.

Organizations gain performance through Photon acceleration, governance through Unity Catalog, and flexibility through standard Spark SQL. The result is a scalable semantic layer integrated with the broader Databricks ecosystem.

For additional resources on semantic data processing, see semantic processing statistics and DataFrame semantic operations. The Typedef blog provides ongoing insights into modern data infrastructure patterns.