Moving from GoodData LDM to Databricks Metric Views requires translating MAQL to Spark SQL, rebuilding dataset relationships as join specifications, and implementing metrics in Unity Catalog. GoodData's proprietary YAML/MAQL definitions become Databricks-native YAML specifications executed through Spark SQL. This guide provides the technical steps for migration while maintaining governed metric definitions and data consistency.
Executive Summary: Migration Approach
The Core Translation
GoodData structures metrics through:
- LDM (Logical Data Model): Datasets with facts, attributes, and relationships
- MAQL: Proprietary query language for reusable metrics
- Workspaces: Tenant-level isolation for multi-tenancy
Databricks structures metrics through:
- Unity Catalog objects: First-class metric views alongside tables
- Spark SQL expressions: Native Spark SQL for metric calculations
- MEASURE clause: SQL syntax for querying metrics
Migration Path
The migration follows this sequence:
- Map GoodData datasets to Delta Lake tables
- Translate relationship definitions to join specifications
- Convert MAQL metrics to Spark SQL expressions
- Register metric views in Unity Catalog
- Implement governance through Unity Catalog RBAC
Key Architectural Shift
| Component | GoodData | Databricks |
|---|---|---|
| Metric language | MAQL (proprietary) | Spark SQL (standard) |
| Execution | MAQL interpreter → vendor SQL | Native Spark SQL + Photon |
| Query syntax | GoodData APIs | MEASURE() clause |
| Storage | YAML or API-managed | Unity Catalog objects |
| Isolation | Workspaces | Schemas + RBAC |
Step 1: Map GoodData LDM to Delta Lake Schema
Export LDM structure
Extract your complete LDM configuration from GoodData. The LDM contains datasets (boxes), relationships (arrows), and metric definitions written in MAQL.
Each dataset contains:
- Facts: Numeric columns for aggregation (order_amount, quantity)
- Attributes: Categorical dimensions (region, category, status)
- Primary keys: Identify unique records
- Foreign keys: Link to other datasets
Relationships define join paths:
- 1:N arrows: One parent record to many child records
- M:N arrows: Many-to-many through bridge tables
Create Delta Lake tables
Map each GoodData dataset to a Delta Lake table:
Fact datasets → Fact tables in Delta Lake:
sqlCREATE TABLE semantic.fact_orders ( order_id STRING NOT NULL, customer_id STRING NOT NULL, product_id STRING NOT NULL, order_date DATE NOT NULL, order_amount DECIMAL(18,2), quantity INT, CONSTRAINT pk_orders PRIMARY KEY (order_id) ) USING DELTA LOCATION 'dbfs:/semantic/fact_orders';
Dimension datasets → Dimension tables:
sqlCREATE TABLE semantic.dim_customers ( customer_id STRING NOT NULL, customer_name STRING, region STRING, tier STRING, CONSTRAINT pk_customers PRIMARY KEY (customer_id) ) USING DELTA LOCATION 'dbfs:/semantic/dim_customers';
Date datasets → Date dimension:
sqlCREATE TABLE semantic.dim_dates ( date_key DATE NOT NULL, year INT, quarter INT, month INT, day INT, day_of_week INT, week_of_year INT, month_start_date DATE, quarter_start_date DATE, CONSTRAINT pk_dates PRIMARY KEY (date_key) ) USING DELTA LOCATION 'dbfs:/semantic/dim_dates';
Translate relationships to join specifications
GoodData relationship arrows become explicit join definitions in metric view YAML:
GoodData LDM:
Orders → Customers (via customer_id)
Orders → Products (via product_id)
Orders → Dates (via order_date)
Databricks YAML:
yamljoins: - table: semantic.dim_customers on: fact_orders.customer_id = dim_customers.customer_id - table: semantic.dim_products on: fact_orders.product_id = dim_products.product_id - table: semantic.dim_dates on: fact_orders.order_date = dim_dates.date_key
Multi-hop joins (Orders → Stores → Regions):
yamljoins: - table: semantic.dim_stores on: fact_orders.store_id = dim_stores.store_id - table: semantic.dim_regions on: dim_stores.region_id = dim_regions.region_id
Step 2: Translate MAQL Metrics to Spark SQL
Simple aggregations
GoodData MAQL:
SELECT SUM({fact/order_amount})
Databricks YAML:
yamlmeasures: - name: total_revenue expr: SUM(order_amount) description: "Sum of all order amounts"
GoodData MAQL with filter:
SELECT SUM({fact/order_amount}) WHERE {label/order_status} = "Completed"
Databricks YAML:
yamlmeasures: - name: completed_revenue expr: SUM(CASE WHEN order_status = 'Completed' THEN order_amount ELSE 0 END) description: "Revenue from completed orders only"
Ratio metrics
MAQL ratio metrics require translation to preserve correct aggregation semantics:
GoodData MAQL:
SELECT SUM({fact/order_amount}) / COUNT({fact/order_id}, ALL)
Databricks YAML:
yamlmeasures: - name: avg_order_value expr: SUM(order_amount) / COUNT(DISTINCT order_id) description: "Average revenue per order"
The Databricks expression computes the ratio after aggregation, ensuring correct results regardless of grouping dimensions.
Derived metrics
Metrics referencing other metrics require restructuring:
GoodData MAQL:
SELECT {metric/revenue} - {metric/cost}
Databricks YAML:
yamlmeasures: - name: revenue expr: SUM(order_amount) - name: cost expr: SUM(unit_cost * quantity) - name: profit expr: revenue - cost description: "Profit computed from revenue minus cost"
Databricks resolves measure references automatically, computing base measures before applying derived calculations.
MAQL BY clause translation
MAQL's BY clause changes aggregation context. Translate to Spark SQL window clauses:
GoodData MAQL:
SELECT SUM({fact/amount}) BY {label/region}, ALL {label/state}
Databricks YAML:
yamlmeasures: - name: regional_total expr: SUM(SUM(amount)) OVER (PARTITION BY region) description: "Total aggregated to region level"
MAQL FOR clause translation
GoodData's FOR clause applies temporary filters. Use CASE statements:
GoodData MAQL:
SELECT {metric/revenue} FOR {label/region} = "West"
Databricks YAML:
yamlmeasures: - name: west_region_revenue expr: SUM(CASE WHEN region = 'West' THEN order_amount ELSE 0 END) description: "Revenue for West region only"
MAQL ALL operator
The ALL operator in MAQL ignores certain dimensions. Translate to subqueries or window clauses:
GoodData MAQL:
SELECT SUM({fact/amount}) / SUM({fact/amount}) BY ALL {label/category}
Databricks alternative:
yamlmeasures: - name: category_pct_of_total expr: SUM(amount) / SUM(SUM(amount)) OVER () description: "Category amount as percentage of total"
Step 3: Create Metric Views in Unity Catalog
Define metric view YAML
Structure complete metric view definitions in YAML:
yamlmetric_view: name: sales_metrics description: "Core sales and revenue metrics" source_table: semantic.fact_orders joins: - table: semantic.dim_customers on: fact_orders.customer_id = dim_customers.customer_id - table: semantic.dim_products on: fact_orders.product_id = dim_products.product_id - table: semantic.dim_dates on: fact_orders.order_date = dim_dates.date_key measures: - name: total_revenue expr: SUM(order_amount) description: "Sum of all order amounts" - name: order_count expr: COUNT(DISTINCT order_id) description: "Count of distinct orders" - name: avg_order_value expr: SUM(order_amount) / COUNT(DISTINCT order_id) description: "Average revenue per order" - name: unique_customers expr: COUNT(DISTINCT customer_id) description: "Count of distinct customers" dimensions: - name: customer_region expr: dim_customers.region description: "Customer geographic region" - name: product_category expr: dim_products.category description: "Product category" - name: order_date expr: DATE(order_timestamp) description: "Date of order placement" - name: order_month expr: DATE_TRUNC('month', order_timestamp) description: "Month of order placement"
Register metric view
Upload YAML and register in Unity Catalog:
sql-- Upload to volume PUT 'file:///local/path/sales_metrics.yaml' INTO '/Volumes/semantic/definitions/sales_metrics.yaml'; -- Create metric view from YAML CREATE METRIC VIEW semantic.sales_metrics FROM '/Volumes/semantic/definitions/sales_metrics.yaml';
Alternative inline SQL DDL:
sqlCREATE METRIC VIEW semantic.sales_metrics AS SELECT SUM(fo.order_amount) AS total_revenue, COUNT(DISTINCT fo.order_id) AS order_count, SUM(fo.order_amount) / COUNT(DISTINCT fo.order_id) AS avg_order_value, dc.region AS customer_region, dp.category AS product_category, DATE(fo.order_timestamp) AS order_date FROM semantic.fact_orders fo JOIN semantic.dim_customers dc ON fo.customer_id = dc.customer_id JOIN semantic.dim_products dp ON fo.product_id = dp.product_id;
Configure permissions
Implement Unity Catalog RBAC:
sql-- Grant SELECT on metric view GRANT SELECT ON METRIC VIEW semantic.sales_metrics TO `analysts`; -- Grant MODIFY for maintainers GRANT MODIFY ON METRIC VIEW semantic.sales_metrics TO `data_engineers`; -- Users only need SELECT on metric view, not underlying tables
Step 4: Query Metric Views
Basic queries
Query metrics using the MEASURE() clause:
sqlSELECT customer_region, MEASURE(total_revenue) AS revenue, MEASURE(order_count) AS orders FROM semantic.sales_metrics GROUP BY customer_region;
Filtered queries
Apply standard SQL WHERE clauses:
sqlSELECT product_category, MEASURE(total_revenue) AS revenue FROM semantic.sales_metrics WHERE order_date >= '2025-01-01' AND customer_region = 'West' GROUP BY product_category;
Multi-dimensional analysis
Slice metrics across multiple dimensions:
sqlSELECT order_month, customer_region, product_category, MEASURE(total_revenue) AS revenue, MEASURE(avg_order_value) AS aov FROM semantic.sales_metrics WHERE order_date >= '2025-01-01' GROUP BY order_month, customer_region, product_category ORDER BY order_month, revenue DESC;
Time-series analysis
Leverage date dimensions for time-based queries:
sqlSELECT order_month, MEASURE(total_revenue) AS revenue, LAG(MEASURE(total_revenue)) OVER (ORDER BY order_month) AS prev_month_revenue, (MEASURE(total_revenue) - LAG(MEASURE(total_revenue)) OVER (ORDER BY order_month)) / LAG(MEASURE(total_revenue)) OVER (ORDER BY order_month) * 100 AS growth_pct FROM semantic.sales_metrics WHERE order_date >= '2024-01-01' GROUP BY order_month ORDER BY order_month;
Step 5: Implement Governance
Certification
Mark production-ready metrics as certified:
sqlALTER METRIC VIEW semantic.sales_metrics SET TBLPROPERTIES ( 'certified' = 'true', 'owner' = 'finance_team', 'certification_date' = '2025-01-15' );
Certified metrics display trust badges in Unity Catalog UI.
Tagging
Apply organizational tags:
sqlALTER METRIC VIEW semantic.sales_metrics SET TAGS ( 'domain' = 'finance', 'pii' = 'false', 'update_frequency' = 'daily', 'data_source' = 'erp' );
Documentation
Document metrics with business context:
yamlmeasures: - name: total_revenue expr: SUM(order_amount) description: | Total order revenue before discounts and refunds. Excludes cancelled and pending orders. Source: ERP system via nightly batch load. Update frequency: Daily at 2 AM UTC. Historical data: Available from 2020-01-01. Business owner: finance@company.com Technical owner: data-eng@company.com
Lineage
Unity Catalog tracks lineage automatically:
sql-- View lineage SELECT * FROM system.access.table_lineage WHERE target_name = 'semantic.sales_metrics';
Step 6: Handle Multi-Tenancy Patterns
GoodData workspaces provide tenant isolation. Implement equivalent patterns in Databricks:
Approach 1: Separate schemas
Create tenant-specific schemas:
sql-- Tenant A CREATE SCHEMA semantic.tenant_a; CREATE METRIC VIEW semantic.tenant_a.sales_metrics FROM '/Volumes/semantic/definitions/tenant_a_sales.yaml'; -- Tenant B CREATE SCHEMA semantic.tenant_b; CREATE METRIC VIEW semantic.tenant_b.sales_metrics FROM '/Volumes/semantic/definitions/tenant_b_sales.yaml';
Approach 2: Row-level security
Single metric view with tenant filtering:
sqlCREATE METRIC VIEW semantic.multi_tenant_metrics AS SELECT SUM(order_amount) AS total_revenue, COUNT(DISTINCT order_id) AS order_count, tenant_id, customer_region FROM semantic.fact_orders WHERE tenant_id = current_user_tenant_id();
Approach 3: Separate metric views
Multiple metric view instances on shared data:
yaml# tenant_a_metrics.yaml metric_view: name: tenant_a_sales_metrics source_table: semantic.fact_orders filters: - tenant_id = 'tenant_a' measures: - name: total_revenue expr: SUM(order_amount)
Step 7: Validate Metric Outputs with Fenic
Configure Fenic session
Set up Fenic for validation pipelines:
pythonimport fenic as fc session = fc.Session.get_or_create( fc.SessionConfig( app_name="metric_validation", semantic=fc.SemanticConfig( language_models={ "flash": fc.GoogleDeveloperLanguageModel( model_name="gemini-2.0-flash", rpm=100, tpm=1000 ) }, default_language_model="flash" ) ) )
Read source data exports
Load data exports for validation:
python# Read source system data source_data = session.read.csv("data/exports/orders.csv") # Transform and prepare prepared = source_data.select( fc.col("order_id"), fc.col("order_date").cast(fc.DateType()), fc.col("customer_id"), fc.col("order_amount").cast(fc.DoubleType()), fc.col("region") ).filter( fc.col("order_amount") > 0 )
Compare metric outputs
Validate Databricks metrics against expected values:
pythonimport fenic as fc from fenic import Session, SessionConfig, SemanticConfig, GoogleDeveloperLanguageModel # Configure Fenic session session = Session.get_or_create( SessionConfig( app_name="metric_validation", semantic=SemanticConfig( language_models={ "flash": GoogleDeveloperLanguageModel( model_name="gemini-2.0-flash", rpm=100, tpm=1000 ) }, default_language_model="flash" ) ) ) # Read source data exports source_data = session.read.csv("data/exports/orders.csv") # Transform and prepare prepared = source_data.select( fc.col("order_id"), fc.col("order_date").cast(fc.DateType), fc.col("customer_id"), fc.col("order_amount").cast(fc.DoubleType), fc.col("region") ).filter( fc.col("order_amount") > 0 ) # Read Databricks results (exported to CSV for validation) databricks_metrics = session.read.csv( "data/databricks_output/revenue_by_region.csv" ) # Read expected values expected_metrics = session.read.csv( "data/expected/revenue_by_region.csv" ) # Compare results with proper join syntax comparison = databricks_metrics.join( expected_metrics, left_on=fc.col("customer_region"), right_on=fc.col("region"), how="inner" ).with_column( "variance", fc.col("revenue") - fc.col("expected_revenue") ).with_column( "abs_variance", fc.when(fc.col("variance") >= 0, fc.col("variance")) .otherwise(-fc.col("variance")) ).with_column( "variance_pct", (fc.col("abs_variance") / fc.col("expected_revenue") * 100) ).select( fc.col("customer_region"), fc.col("revenue").alias("actual_revenue"), fc.col("expected_revenue"), fc.col("variance"), fc.col("variance_pct") ) # Filter significant variances issues = comparison.filter(fc.col("variance_pct") > 0.1) issues.show()
Semantic validation
Use Fenic's semantic operators to validate metric logic:
pythonfrom pydantic import BaseModel, Field from typing import List class MetricValidation(BaseModel): metric_name: str = Field(description="Name of the metric") logic_correct: bool = Field(description="Whether the metric logic is correct") issues: List[str] = Field(description="List of any issues found") # Extract and validate metric definitions (YAML files must be read as JSON or processed differently) metric_docs = session.read.docs( "data/metric_definitions.json", # Changed to JSON content_type="json" ) validated = metric_docs.with_column( "validation", fc.semantic.extract( fc.col("content"), # The column name for content from read.docs is "content" MetricValidation ) ).unnest("validation") # Review metrics with issues problems = validated.filter(fc.col("logic_correct") == False) problems.show()
Step 8: Optimize Performance
Enable Photon
Configure SQL warehouses with Photon acceleration:
sqlCREATE WAREHOUSE semantic_warehouse WAREHOUSE_SIZE = 'MEDIUM' AUTO_STOP = 15 ENABLE_PHOTON = TRUE;
Optimize table layout
Apply Z-ORDER clustering on commonly filtered columns:
sqlOPTIMIZE semantic.fact_orders ZORDER BY (customer_id, order_date, product_id);
Maintain table statistics
Update statistics for query optimization:
sqlANALYZE TABLE semantic.fact_orders COMPUTE STATISTICS; ANALYZE TABLE semantic.dim_customers COMPUTE STATISTICS; ANALYZE TABLE semantic.dim_products COMPUTE STATISTICS;
Materialized aggregations
Create materialized views for frequently queried aggregations:
sqlCREATE MATERIALIZED VIEW semantic.monthly_revenue AS SELECT DATE_TRUNC('month', order_date) as month, customer_region, MEASURE(total_revenue) as revenue, MEASURE(order_count) as orders FROM semantic.sales_metrics GROUP BY month, customer_region;
Common Translation Challenges
Logical normalization
GoodData's attribute hierarchies (State → Region) must be explicitly modeled:
GoodData: Implicit hierarchy in LDM Databricks: Explicit dimension table
sqlCREATE TABLE semantic.dim_geography ( state_code STRING, state_name STRING, region_code STRING, region_name STRING, PRIMARY KEY (state_code) ) USING DELTA;
Reference in metric view:
yamldimensions: - name: state expr: dim_geography.state_name - name: region expr: dim_geography.region_name
Context-free metric reusability
GoodData metrics automatically adapt to query dimensions. Databricks metrics work similarly but require explicit dimension definitions:
GoodData: Single metric works across all dimensions
Databricks: Same behavior through MEASURE() clause
Both systems achieve context-free reusability. The difference is syntactic, not functional.
Workspace-level filtering
GoodData workspaces isolate data. Databricks requires explicit filtering:
yamlmetric_view: name: filtered_metrics source_table: semantic.fact_orders filters: - workspace_id = current_workspace() measures: - name: total_revenue expr: SUM(order_amount)
Or implement through views:
sqlCREATE VIEW semantic.workspace_a_orders AS SELECT * FROM semantic.fact_orders WHERE workspace_id = 'workspace_a';
Monitoring Metric Usage
Query audit logs
Track metric view queries:
sqlSELECT request_params.metric_view as metric_view, COUNT(*) as query_count, AVG(request_params.execution_time_ms) as avg_duration_ms, COUNT(DISTINCT user_identity.email) as unique_users FROM system.access.audit WHERE action_name = 'executeMeasure' AND request_params.metric_view IS NOT NULL AND event_date >= CURRENT_DATE() - INTERVAL 30 DAYS GROUP BY metric_view ORDER BY query_count DESC;
Performance metrics
Identify slow queries:
sqlSELECT request_params.metric_view, request_params.execution_time_ms, request_params.query_text, user_identity.email FROM system.access.audit WHERE action_name = 'executeMeasure' AND request_params.execution_time_ms > 10000 AND event_date >= CURRENT_DATE() - INTERVAL 7 DAYS ORDER BY request_params.execution_time_ms DESC;
Usage patterns
Analyze dimension combinations:
sqlSELECT request_params.dimensions_used, COUNT(*) as frequency FROM system.access.audit WHERE action_name = 'executeMeasure' AND event_date >= CURRENT_DATE() - INTERVAL 30 DAYS GROUP BY dimensions_used ORDER BY frequency DESC;
Migration Validation Checklist
Data preparation:
- Export complete LDM structure
- Document all dataset relationships
- Extract all MAQL metric definitions
- Map datasets to Delta Lake tables
- Create star/snowflake schema
Metric translation:
- Convert simple aggregations
- Translate ratio metrics
- Convert derived metrics
- Handle MAQL BY clauses
- Translate MAQL FOR clauses
- Convert MAQL ALL operators
Implementation:
- Create metric view YAML definitions
- Register metric views in Unity Catalog
- Configure RBAC permissions
- Set up certification and tags
- Document business context
Validation:
- Compare metric outputs
- Validate multi-dimensional queries
- Test time-series calculations
- Verify filtered metric results
- Check derived metric accuracy
Performance:
- Enable Photon acceleration
- Apply Z-ORDER clustering
- Update table statistics
- Create materialized views for common queries
- Benchmark query response times
Monitoring:
- Set up query audit tracking
- Monitor performance metrics
- Track usage patterns
- Identify optimization opportunities
Conclusion
Migrating from GoodData LDM to Databricks Unity Catalog Metric Views translates proprietary MAQL definitions to standard Spark SQL while maintaining metric consistency and governance. The lakehouse-native approach unifies metrics across BI, data science, and ML workloads through Unity Catalog's first-class metric objects.
The migration requires systematic translation of dataset relationships to join specifications, MAQL expressions to Spark SQL, and workspace isolation to Unity Catalog RBAC. Fenic provides reliable semantic operators for validation pipelines during migration.
Organizations gain performance through Photon acceleration, governance through Unity Catalog, and flexibility through standard Spark SQL. The result is a scalable semantic layer integrated with the broader Databricks ecosystem.
For additional resources on semantic data processing, see semantic processing statistics and DataFrame semantic operations. The Typedef blog provides ongoing insights into modern data infrastructure patterns.
