← Back to all products

Medallion Architecture Accelerator

$79

Production implementation of Bronze/Silver/Gold architecture on Databricks with metadata-driven pipeline generator and quality gates.

📁 16 files🏷 v1.0.0
PythonYAMLMarkdownJSONDatabricksPySparkSparkDelta Lake

📁 File Structure 16 files

medallion-architecture-accelerator/ ├── README.md ├── config/ │ ├── example_sources.yaml │ └── pipeline_generator.py ├── docs/ │ ├── architecture_guide.md │ └── performance_benchmarks.md ├── framework/ │ ├── bronze/ │ │ ├── auto_loader_ingestor.py │ │ ├── base_ingestor.py │ │ └── jdbc_ingestor.py │ ├── gold/ │ │ ├── aggregation_builder.py │ │ └── fact_table_builder.py │ └── silver/ │ ├── base_transformer.py │ └── scd_type2.py ├── optimization/ │ └── delta_optimization.py ├── quality/ │ └── quality_gates.py └── testing/ └── layer_tests.py

📖 Documentation Preview README excerpt

Medallion Architecture Accelerator

By [Datanest Digital](https://datanest.dev) | Version 1.0.0 | $79

Production-ready framework for implementing the Medallion Architecture (Bronze / Silver / Gold) on Databricks with Delta Lake. Eliminate weeks of boilerplate and enforce best practices from day one.

---

What You Get

Bronze Layer (Raw Ingestion)

  • Base Ingestor -- Schema evolution, exactly-once semantics, source lineage tracking
  • Auto Loader Ingestor -- Cloudfiles-based streaming ingestion with rescue columns
  • JDBC Ingestor -- Incremental extraction from relational databases with watermark tracking

Silver Layer (Conformed / Cleansed)

  • Base Transformer -- Deduplication, type casting, null handling, data conformance
  • SCD Type 2 -- Slowly-changing dimension management with merge-based upserts

Gold Layer (Business / Aggregated)

  • Aggregation Builder -- Business KPIs, time-series rollups, dimensional model support
  • Fact Table Builder -- Star-schema fact tables with conformed dimension joins

Pipeline Generation

  • Metadata-Driven Generator -- Define sources in YAML, generate all three layers automatically
  • Example Configs -- Five pre-built source configurations covering common patterns

Quality & Optimization

  • Quality Gates -- Null checks, schema validation, business rules, inter-layer assertions
  • Delta Optimization -- OPTIMIZE, ZORDER, VACUUM, liquid clustering per layer tier
  • Layer Tests -- Unit and integration testing framework for every layer

Documentation

  • Architecture guide with patterns and anti-patterns
  • Performance benchmarks and sizing guidelines

---

Quick Start

1. Configure a Source


# config/sources/orders.yaml
source:
  name: orders
  type: auto_loader
  format: json
  path: "s3://raw-bucket/orders/"
  schema_hints:
    order_id: long
    customer_id: long
    order_date: timestamp

bronze:
  database: raw
  table: orders
  partition_columns: [ingestion_date]

silver:

*... continues with setup instructions, usage examples, and more.*

📄 Code Sample .py preview

config/pipeline_generator.py # Databricks notebook source # MAGIC %md # MAGIC # Metadata-Driven Pipeline Generator # MAGIC **Medallion Architecture Accelerator** by [Datanest Digital](https://datanest.dev) # MAGIC # MAGIC Reads a YAML configuration file and generates complete Bronze, Silver, # MAGIC and Gold layer pipelines. Enables a declarative, config-driven # MAGIC approach to building medallion architectures. # COMMAND ---------- from __future__ import annotations import copy from dataclasses import dataclass, field from pathlib import Path from typing import Any, Dict, List, Optional import yaml from pyspark.sql import SparkSession # COMMAND ---------- # MAGIC %md # MAGIC ## Pipeline Configuration Model # COMMAND ---------- @dataclass class SourceConfig: """Parsed representation of a single source YAML configuration. This is an intermediate model that the generator converts into concrete layer configurations (Bronze, Silver, Gold). """ # Source identification. name: str = "" source_system: str = "" source_type: str = "auto_loader" # auto_loader | jdbc | custom # Bronze settings. source_format: str = "json" source_path: str = "" schema_hints: Dict[str, str] = field(default_factory=dict) bronze_catalog: str = "" bronze_database: str = "" bronze_table: str = "" # ... 307 more lines ...