← Back to all products
$39
Data Catalog Builder
Automated metadata extraction, data dictionary generation, lineage mapping, ownership assignment, and search interface.
PythonYAMLJSONMarkdownDatabricks
📁 File Structure 18 files
data-catalog-builder/
├── LICENSE
├── README.md
├── configs/
│ ├── catalog_config.yaml
│ ├── classification_rules.yaml
│ └── templates/
│ └── table_doc.md.j2
├── guides/
│ └── data-catalog-guide.md
├── notebooks/
│ ├── catalog_dashboard.py
│ └── scan_catalog.py
├── src/
│ ├── catalog_reporter.py
│ ├── catalog_scanner.py
│ ├── lineage_mapper.py
│ ├── metadata_enricher.py
│ ├── quality_scorer.py
│ └── search_index.py
└── tests/
├── conftest.py
├── test_catalog_scanner.py
└── test_quality_scorer.py
📖 Documentation Preview README excerpt
Data Catalog Builder
Automated metadata discovery, data dictionary generation, quality scoring, and searchable catalog for Databricks Unity Catalog.
By [Datanest Digital](https://datanest.dev) | Version 1.0.0 | $39
---
What You Get
- Catalog Scanner — Discover tables, columns, and metadata across Unity Catalog schemas
- Metadata Enricher — Add business descriptions, owners, tags, and classification labels
- Lineage Mapper — Trace column-level lineage from bronze to gold layers
- Search Index — Full-text search across table/column names, descriptions, and tags
- Quality Scorer — Score tables on completeness, freshness, documentation, and conformance
- Catalog Reporter — Generate Markdown data dictionaries from Jinja2 templates
File Tree
data-catalog-builder/
├── README.md
├── manifest.json
├── LICENSE
├── src/
│ ├── catalog_scanner.py # Unity Catalog metadata discovery
│ ├── metadata_enricher.py # Business metadata enrichment
│ ├── lineage_mapper.py # Column-level lineage tracing
│ ├── search_index.py # Full-text catalog search
│ ├── quality_scorer.py # Data quality scoring engine
│ └── catalog_reporter.py # Markdown report generator
├── configs/
│ ├── catalog_config.yaml # Scanner and enrichment settings
│ ├── classification_rules.yaml # PII / sensitivity classification
│ └── templates/
│ └── table_doc.md.j2 # Jinja2 table documentation template
├── notebooks/
│ ├── scan_catalog.py # Run catalog discovery scan
│ └── catalog_dashboard.py # Catalog health dashboard
├── tests/
│ ├── conftest.py # Shared test fixtures
│ ├── test_catalog_scanner.py # Scanner tests
│ └── test_quality_scorer.py # Quality scoring tests
└── guides/
└── data-catalog-guide.md # Setup and usage guide
Getting Started
1. Configure the Scanner
Edit configs/catalog_config.yaml with your Unity Catalog settings:
scanner:
catalogs: ["main"]
schemas: ["bronze", "silver", "gold"]
exclude_patterns: ["_tmp_*", "_staging_*"]
... continues with setup instructions, usage examples, and more.
📄 Code Sample .py preview
src/catalog_reporter.py
"""
Catalog Reporter — Generate Markdown data dictionaries from Jinja2 templates.
Renders table documentation using a configurable Jinja2 template and
writes the output to the Databricks workspace or local filesystem.
Author: Datanest Digital
"""
from __future__ import annotations
import logging
import os
from dataclasses import dataclass
from datetime import datetime
from typing import Any, Dict, List, Optional
from jinja2 import Environment, FileSystemLoader, Template
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# CatalogReporter
# ---------------------------------------------------------------------------
class CatalogReporter:
"""Generate Markdown documentation for catalog tables.
Uses Jinja2 templates to render per-table data dictionaries that
include column metadata, tags, classifications, and quality scores.
Usage::
reporter = CatalogReporter(template_path="configs/templates/table_doc.md.j2")
reporter.generate_all(catalog, output_dir="/Workspace/docs/catalog")
"""
def __init__(
self,
template_path: str = "configs/templates/table_doc.md.j2",
) -> None:
self._template = self._load_template(template_path)
logger.info("CatalogReporter initialised with template: %s", template_path)
# ----- Template loading -----
@staticmethod
def _load_template(path: str) -> Template:
"""Load a Jinja2 template from a file path."""
# ... 133 more lines ...