← Back to all products

Data Quality Framework

$49

Great Expectations integration, custom validation rules, data profiling scripts, anomaly detection, and quality dashboards.

📁 19 files🏷 v1.0.0
PythonYAMLMarkdownJSONDatabricksSparkDelta Lake

📁 File Structure 19 files

data-quality-framework/ ├── LICENSE ├── README.md ├── configs/ │ ├── quality_rules.yaml │ └── thresholds.yaml ├── guides/ │ └── data-quality-strategy.md ├── notebooks/ │ └── run_quality_checks.py ├── src/ │ ├── checks/ │ │ ├── completeness.py │ │ ├── consistency.py │ │ ├── custom.py │ │ ├── freshness.py │ │ ├── uniqueness.py │ │ └── validity.py │ ├── quality_engine.py │ └── reporters/ │ ├── delta_reporter.py │ ├── html_reporter.py │ └── slack_reporter.py └── tests/ ├── conftest.py └── test_quality_engine.py

📖 Documentation Preview README excerpt

Data Quality Framework

Trust your data. A pluggable quality engine with built-in checks for completeness,

uniqueness, validity, freshness, and consistency — plus automated reporting to Slack,

HTML, and Delta Lake.

By [Datanest Digital](https://datanest.dev) | Version 1.0.0 | $49

---

What You Get

  • Quality Engine — Rule-based engine that loads checks from YAML, executes them

against any Spark DataFrame, aggregates results, and produces structured reports

  • 6 Check Types — Completeness (null/empty), uniqueness (duplicates), validity

(regex, range, enum), freshness (staleness), consistency (cross-table), and custom

(arbitrary SQL expressions)

  • 3 Reporters — Slack webhook notifications, standalone HTML reports, and Delta Lake

audit table writer for historical trending

  • YAML Configuration — Define rules and thresholds in human-readable YAML; no code

changes needed to add new checks

  • Databricks Notebook — Ready-to-run notebook for executing quality checks as a

scheduled job

  • Strategy Guide — Best practices for implementing data quality at scale

File Tree


data-quality-framework/
├── README.md
├── manifest.json
├── LICENSE
├── src/
│   ├── quality_engine.py              # Core engine: load, execute, report
│   ├── checks/
│   │   ├── completeness.py            # Null/empty field checks
│   │   ├── uniqueness.py              # Duplicate detection
│   │   ├── validity.py                # Regex, range, enum validation
│   │   ├── freshness.py               # Data staleness checks
│   │   ├── consistency.py             # Cross-table consistency
│   │   └── custom.py                  # Arbitrary SQL expression checks
│   └── reporters/
│       ├── slack_reporter.py          # Slack webhook notifications
│       ├── html_reporter.py           # Standalone HTML report
│       └── delta_reporter.py          # Delta Lake audit table writer
├── configs/
│   ├── quality_rules.yaml             # Rule definitions
│   └── thresholds.yaml                # Pass/warn/fail thresholds
├── notebooks/
│   └── run_quality_checks.py          # Databricks notebook
├── tests/
│   ├── conftest.py                    # Shared fixtures
│   └── test_quality_engine.py         # Unit tests
└── guides/
    └── data-quality-strategy.md       # Best practices guide

Getting Started

1. Define your quality rules

... continues with setup instructions, usage examples, and more.

📄 Code Sample .py preview

src/quality_engine.py """ Quality Engine — Core engine for loading rules, executing checks, and producing reports. Orchestrates the full data quality lifecycle: - Load rule definitions from YAML configuration - Execute checks against Spark DataFrames - Aggregate results with pass/warn/fail thresholds - Generate structured quality reports Designed for Databricks Runtime — uses global `spark` session. By Datanest Digital | https://datanest.dev """ from __future__ import annotations from dataclasses import dataclass, field from datetime import datetime, timezone from enum import Enum from typing import Any, Callable, Optional import yaml from pyspark.sql import DataFrame, SparkSession from pyspark.sql import functions as F from src.checks.completeness import CompletenessCheck from src.checks.consistency import ConsistencyCheck from src.checks.custom import CustomCheck from src.checks.freshness import FreshnessCheck from src.checks.uniqueness import UniquenessCheck from src.checks.validity import ValidityCheck class CheckStatus(Enum): """Result status for a quality check.""" PASSED = "passed" WARNING = "warning" FAILED = "failed" ERROR = "error" @dataclass class ThresholdConfig: """Pass/warn/fail threshold configuration.""" default_pass: float = 1.0 default_warn: float = 0.95 table_overrides: dict[str, dict[str, float]] = field(default_factory=dict) # ... 368 more lines ...