Real-Time Streaming Toolkit
Production streaming patterns for Structured Streaming and Delta Live Tables with Kafka/Event Hub integration and monitoring dashboards.
📁 File Structure 15 files
📖 Documentation Preview README excerpt
Real-Time Streaming Toolkit
Product ID: real-time-streaming-toolkit
Version: 1.0.0
Price: $69
Category: Data Engineering
Author: [Datanest Digital](https://datanest.dev)
---
Overview
The Real-Time Streaming Toolkit is a production-grade collection of PySpark notebooks, Delta Live Tables pipelines, monitoring dashboards, and operational guides for building robust real-time data pipelines on Databricks. Every component has been battle-tested against high-throughput workloads and designed for exactly-once processing guarantees.
Whether you are ingesting from Kafka, Azure Event Hubs, or cloud object storage via Auto Loader, this toolkit gives you a proven starting point that eliminates weeks of trial-and-error engineering.
What's Included
Structured Streaming Notebooks
| File | Description |
|------|-------------|
| structured-streaming/kafka_source.py | Kafka source with schema registry integration, checkpoint management, and consumer group orchestration |
| structured-streaming/event_hub_source.py | Azure Event Hub source with native checkpoint store, partition-aware processing, and backpressure handling |
| structured-streaming/auto_loader_streaming.py | Auto Loader (cloudFiles) patterns for file-based streaming from S3, ADLS, and GCS |
| structured-streaming/deduplication.py | Exactly-once processing strategies including watermark-based deduplication, idempotent writes, and state management |
Delta Live Tables Pipelines
| File | Description |
|------|-------------|
| dlt/streaming_medallion.py | Full medallion architecture (Bronze/Silver/Gold) as a streaming DLT pipeline |
| dlt/cdc_processing.py | Change Data Capture processing with APPLY CHANGES INTO for SCD Type 1 and Type 2 |
| dlt/expectations_library.py | Reusable DLT data quality expectations library with severity levels and alerting hooks |
Monitoring & Alerting
| File | Description |
|------|-------------|
| monitoring/streaming_dashboard.sql | SQL dashboard queries for streaming lag, throughput, error rates, and checkpoint health |
| monitoring/streaming_alerts.sql | Alert queries for detecting pipeline failures, excessive lag, and data quality regressions |
Configuration Guides
| File | Description |
|------|-------------|
| config/autoscaling_config.md | Auto-scaling configuration for streaming clusters with recommended instance types and scaling policies |
| config/checkpoint_management.md | Checkpoint repair, migration, and disaster recovery procedures |
Operational Guides
| File | Description |
|------|-------------|
| guides/performance_tuning.md | Trigger intervals, partition sizing, shuffle optimization, and state store tuning |
| guides/failure_recovery.md | Failure recovery playbook covering checkpoint corruption, schema evolution, and cluster failures |
Requirements
- Databricks Runtime 13.3 LTS or later (14.x+ recommended)
- Unity Catalog enabled workspace (recommended)
... continues with setup instructions, usage examples, and more.