← Back to all products
$49
Data Quality Framework
Great Expectations integration, custom validation rules, data profiling scripts, anomaly detection, and quality dashboards.
PythonYAMLMarkdownJSONDatabricksSparkDelta Lake
📁 File Structure 19 files
data-quality-framework/
├── LICENSE
├── README.md
├── configs/
│ ├── quality_rules.yaml
│ └── thresholds.yaml
├── guides/
│ └── data-quality-strategy.md
├── notebooks/
│ └── run_quality_checks.py
├── src/
│ ├── checks/
│ │ ├── completeness.py
│ │ ├── consistency.py
│ │ ├── custom.py
│ │ ├── freshness.py
│ │ ├── uniqueness.py
│ │ └── validity.py
│ ├── quality_engine.py
│ └── reporters/
│ ├── delta_reporter.py
│ ├── html_reporter.py
│ └── slack_reporter.py
└── tests/
├── conftest.py
└── test_quality_engine.py
📖 Documentation Preview README excerpt
Data Quality Framework
Trust your data. A pluggable quality engine with built-in checks for completeness,
uniqueness, validity, freshness, and consistency — plus automated reporting to Slack,
HTML, and Delta Lake.
By [Datanest Digital](https://datanest.dev) | Version 1.0.0 | $49
---
What You Get
- Quality Engine — Rule-based engine that loads checks from YAML, executes them
against any Spark DataFrame, aggregates results, and produces structured reports
- 6 Check Types — Completeness (null/empty), uniqueness (duplicates), validity
(regex, range, enum), freshness (staleness), consistency (cross-table), and custom
(arbitrary SQL expressions)
- 3 Reporters — Slack webhook notifications, standalone HTML reports, and Delta Lake
audit table writer for historical trending
- YAML Configuration — Define rules and thresholds in human-readable YAML; no code
changes needed to add new checks
- Databricks Notebook — Ready-to-run notebook for executing quality checks as a
scheduled job
- Strategy Guide — Best practices for implementing data quality at scale
File Tree
data-quality-framework/
├── README.md
├── manifest.json
├── LICENSE
├── src/
│ ├── quality_engine.py # Core engine: load, execute, report
│ ├── checks/
│ │ ├── completeness.py # Null/empty field checks
│ │ ├── uniqueness.py # Duplicate detection
│ │ ├── validity.py # Regex, range, enum validation
│ │ ├── freshness.py # Data staleness checks
│ │ ├── consistency.py # Cross-table consistency
│ │ └── custom.py # Arbitrary SQL expression checks
│ └── reporters/
│ ├── slack_reporter.py # Slack webhook notifications
│ ├── html_reporter.py # Standalone HTML report
│ └── delta_reporter.py # Delta Lake audit table writer
├── configs/
│ ├── quality_rules.yaml # Rule definitions
│ └── thresholds.yaml # Pass/warn/fail thresholds
├── notebooks/
│ └── run_quality_checks.py # Databricks notebook
├── tests/
│ ├── conftest.py # Shared fixtures
│ └── test_quality_engine.py # Unit tests
└── guides/
└── data-quality-strategy.md # Best practices guide
Getting Started
1. Define your quality rules
... continues with setup instructions, usage examples, and more.
📄 Code Sample .py preview
src/quality_engine.py
"""
Quality Engine — Core engine for loading rules, executing checks, and producing reports.
Orchestrates the full data quality lifecycle:
- Load rule definitions from YAML configuration
- Execute checks against Spark DataFrames
- Aggregate results with pass/warn/fail thresholds
- Generate structured quality reports
Designed for Databricks Runtime — uses global `spark` session.
By Datanest Digital | https://datanest.dev
"""
from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime, timezone
from enum import Enum
from typing import Any, Callable, Optional
import yaml
from pyspark.sql import DataFrame, SparkSession
from pyspark.sql import functions as F
from src.checks.completeness import CompletenessCheck
from src.checks.consistency import ConsistencyCheck
from src.checks.custom import CustomCheck
from src.checks.freshness import FreshnessCheck
from src.checks.uniqueness import UniquenessCheck
from src.checks.validity import ValidityCheck
class CheckStatus(Enum):
"""Result status for a quality check."""
PASSED = "passed"
WARNING = "warning"
FAILED = "failed"
ERROR = "error"
@dataclass
class ThresholdConfig:
"""Pass/warn/fail threshold configuration."""
default_pass: float = 1.0
default_warn: float = 0.95
table_overrides: dict[str, dict[str, float]] = field(default_factory=dict)
# ... 368 more lines ...