← Back to all products
$79
Medallion Architecture Accelerator
Production implementation of Bronze/Silver/Gold architecture on Databricks with metadata-driven pipeline generator and quality gates.
PythonYAMLMarkdownJSONDatabricksPySparkSparkDelta Lake
📁 File Structure 16 files
medallion-architecture-accelerator/
├── README.md
├── config/
│ ├── example_sources.yaml
│ └── pipeline_generator.py
├── docs/
│ ├── architecture_guide.md
│ └── performance_benchmarks.md
├── framework/
│ ├── bronze/
│ │ ├── auto_loader_ingestor.py
│ │ ├── base_ingestor.py
│ │ └── jdbc_ingestor.py
│ ├── gold/
│ │ ├── aggregation_builder.py
│ │ └── fact_table_builder.py
│ └── silver/
│ ├── base_transformer.py
│ └── scd_type2.py
├── optimization/
│ └── delta_optimization.py
├── quality/
│ └── quality_gates.py
└── testing/
└── layer_tests.py
📖 Documentation Preview README excerpt
Medallion Architecture Accelerator
By [Datanest Digital](https://datanest.dev) | Version 1.0.0 | $79
Production-ready framework for implementing the Medallion Architecture (Bronze / Silver / Gold) on Databricks with Delta Lake. Eliminate weeks of boilerplate and enforce best practices from day one.
---
What You Get
Bronze Layer (Raw Ingestion)
- Base Ingestor -- Schema evolution, exactly-once semantics, source lineage tracking
- Auto Loader Ingestor -- Cloudfiles-based streaming ingestion with rescue columns
- JDBC Ingestor -- Incremental extraction from relational databases with watermark tracking
Silver Layer (Conformed / Cleansed)
- Base Transformer -- Deduplication, type casting, null handling, data conformance
- SCD Type 2 -- Slowly-changing dimension management with merge-based upserts
Gold Layer (Business / Aggregated)
- Aggregation Builder -- Business KPIs, time-series rollups, dimensional model support
- Fact Table Builder -- Star-schema fact tables with conformed dimension joins
Pipeline Generation
- Metadata-Driven Generator -- Define sources in YAML, generate all three layers automatically
- Example Configs -- Five pre-built source configurations covering common patterns
Quality & Optimization
- Quality Gates -- Null checks, schema validation, business rules, inter-layer assertions
- Delta Optimization -- OPTIMIZE, ZORDER, VACUUM, liquid clustering per layer tier
- Layer Tests -- Unit and integration testing framework for every layer
Documentation
- Architecture guide with patterns and anti-patterns
- Performance benchmarks and sizing guidelines
---
Quick Start
1. Configure a Source
# config/sources/orders.yaml
source:
name: orders
type: auto_loader
format: json
path: "s3://raw-bucket/orders/"
schema_hints:
order_id: long
customer_id: long
order_date: timestamp
bronze:
database: raw
table: orders
partition_columns: [ingestion_date]
silver:
*... continues with setup instructions, usage examples, and more.*
📄 Code Sample .py preview
config/pipeline_generator.py
# Databricks notebook source
# MAGIC %md
# MAGIC # Metadata-Driven Pipeline Generator
# MAGIC **Medallion Architecture Accelerator** by [Datanest Digital](https://datanest.dev)
# MAGIC
# MAGIC Reads a YAML configuration file and generates complete Bronze, Silver,
# MAGIC and Gold layer pipelines. Enables a declarative, config-driven
# MAGIC approach to building medallion architectures.
# COMMAND ----------
from __future__ import annotations
import copy
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Optional
import yaml
from pyspark.sql import SparkSession
# COMMAND ----------
# MAGIC %md
# MAGIC ## Pipeline Configuration Model
# COMMAND ----------
@dataclass
class SourceConfig:
"""Parsed representation of a single source YAML configuration.
This is an intermediate model that the generator converts into
concrete layer configurations (Bronze, Silver, Gold).
"""
# Source identification.
name: str = ""
source_system: str = ""
source_type: str = "auto_loader" # auto_loader | jdbc | custom
# Bronze settings.
source_format: str = "json"
source_path: str = ""
schema_hints: Dict[str, str] = field(default_factory=dict)
bronze_catalog: str = ""
bronze_database: str = ""
bronze_table: str = ""
# ... 307 more lines ...