← Back to all products

Data Catalog Builder

$39

Automated metadata extraction, data dictionary generation, lineage mapping, ownership assignment, and search interface.

📁 18 files🏷 v1.0.0
PythonYAMLJSONMarkdownDatabricks

📁 File Structure 18 files

data-catalog-builder/ ├── LICENSE ├── README.md ├── configs/ │ ├── catalog_config.yaml │ ├── classification_rules.yaml │ └── templates/ │ └── table_doc.md.j2 ├── guides/ │ └── data-catalog-guide.md ├── notebooks/ │ ├── catalog_dashboard.py │ └── scan_catalog.py ├── src/ │ ├── catalog_reporter.py │ ├── catalog_scanner.py │ ├── lineage_mapper.py │ ├── metadata_enricher.py │ ├── quality_scorer.py │ └── search_index.py └── tests/ ├── conftest.py ├── test_catalog_scanner.py └── test_quality_scorer.py

📖 Documentation Preview README excerpt

Data Catalog Builder

Automated metadata discovery, data dictionary generation, quality scoring, and searchable catalog for Databricks Unity Catalog.

By [Datanest Digital](https://datanest.dev) | Version 1.0.0 | $39

---

What You Get

  • Catalog Scanner — Discover tables, columns, and metadata across Unity Catalog schemas
  • Metadata Enricher — Add business descriptions, owners, tags, and classification labels
  • Lineage Mapper — Trace column-level lineage from bronze to gold layers
  • Search Index — Full-text search across table/column names, descriptions, and tags
  • Quality Scorer — Score tables on completeness, freshness, documentation, and conformance
  • Catalog Reporter — Generate Markdown data dictionaries from Jinja2 templates

File Tree


data-catalog-builder/
├── README.md
├── manifest.json
├── LICENSE
├── src/
│   ├── catalog_scanner.py           # Unity Catalog metadata discovery
│   ├── metadata_enricher.py         # Business metadata enrichment
│   ├── lineage_mapper.py            # Column-level lineage tracing
│   ├── search_index.py              # Full-text catalog search
│   ├── quality_scorer.py            # Data quality scoring engine
│   └── catalog_reporter.py          # Markdown report generator
├── configs/
│   ├── catalog_config.yaml          # Scanner and enrichment settings
│   ├── classification_rules.yaml    # PII / sensitivity classification
│   └── templates/
│       └── table_doc.md.j2          # Jinja2 table documentation template
├── notebooks/
│   ├── scan_catalog.py              # Run catalog discovery scan
│   └── catalog_dashboard.py         # Catalog health dashboard
├── tests/
│   ├── conftest.py                  # Shared test fixtures
│   ├── test_catalog_scanner.py      # Scanner tests
│   └── test_quality_scorer.py       # Quality scoring tests
└── guides/
    └── data-catalog-guide.md        # Setup and usage guide

Getting Started

1. Configure the Scanner

Edit configs/catalog_config.yaml with your Unity Catalog settings:


scanner:
  catalogs: ["main"]
  schemas: ["bronze", "silver", "gold"]
  exclude_patterns: ["_tmp_*", "_staging_*"]

... continues with setup instructions, usage examples, and more.

📄 Code Sample .py preview

src/catalog_reporter.py """ Catalog Reporter — Generate Markdown data dictionaries from Jinja2 templates. Renders table documentation using a configurable Jinja2 template and writes the output to the Databricks workspace or local filesystem. Author: Datanest Digital """ from __future__ import annotations import logging import os from dataclasses import dataclass from datetime import datetime from typing import Any, Dict, List, Optional from jinja2 import Environment, FileSystemLoader, Template logger = logging.getLogger(__name__) # --------------------------------------------------------------------------- # CatalogReporter # --------------------------------------------------------------------------- class CatalogReporter: """Generate Markdown documentation for catalog tables. Uses Jinja2 templates to render per-table data dictionaries that include column metadata, tags, classifications, and quality scores. Usage:: reporter = CatalogReporter(template_path="configs/templates/table_doc.md.j2") reporter.generate_all(catalog, output_dir="/Workspace/docs/catalog") """ def __init__( self, template_path: str = "configs/templates/table_doc.md.j2", ) -> None: self._template = self._load_template(template_path) logger.info("CatalogReporter initialised with template: %s", template_path) # ----- Template loading ----- @staticmethod def _load_template(path: str) -> Template: """Load a Jinja2 template from a file path.""" # ... 133 more lines ...