Streaming Pipeline Kit
Kafka consumer/producer templates, Spark Structured Streaming jobs, exactly-once processing patterns, and dead letter queues.
📁 File Structure 15 files
📖 Documentation Preview README excerpt
Streaming Pipeline Kit
Real-time data pipelines that just work. Production-ready Spark Structured Streaming
templates with Kafka integration, exactly-once semantics, and Delta Lake sinks.
By [Datanest Digital](https://datanest.dev) | Version 1.0.0 | $49
---
What You Get
- Kafka Consumer — Structured Streaming reader with schema registry integration,
watermarking, and configurable checkpointing
- Event Processor — Deduplication by event ID, late-arrival handling, and windowed
aggregations with customizable window sizes
- Delta Lake Writer —
foreachBatchsink with merge/append modes, schema evolution,
and inline data quality checks
- Stream Monitor — Query progress listener, consumer lag tracking, dead letter queue
routing, and alerting hooks
- Schema Registry Client — Confluent-compatible client for fetching, registering,
and validating Avro schemas with compatibility checks
- Databricks Notebooks — Ready-to-run notebooks for starting streams and monitoring
active queries in real time
- Avro Schemas — Example schemas for user events and order events
- Streaming Patterns Guide — Best practices for watermarks, triggers, state management,
and failure recovery
File Tree
streaming-pipeline-kit/
├── README.md
├── manifest.json
├── LICENSE
├── src/
│ ├── kafka_consumer.py # Kafka source with schema registry
│ ├── event_processor.py # Dedup, late arrivals, windowed aggs
│ ├── stream_to_delta.py # foreachBatch Delta Lake writer
│ ├── stream_monitor.py # Progress listener & lag monitoring
│ └── schema_registry.py # Schema Registry client
├── configs/
│ ├── streaming_config.yaml # Kafka, checkpoint, trigger settings
│ └── schemas/
│ ├── user_events.avsc # User event Avro schema
│ └── order_events.avsc # Order event Avro schema
├── notebooks/
│ ├── start_stream.py # Launch streaming pipeline
│ └── monitor_streams.py # Real-time monitoring dashboard
├── tests/
│ └── test_event_processor.py # Unit tests for event processing
└── guides/
└── streaming-patterns.md # Patterns & best practices
Getting Started
1. Configure your environment
Edit configs/streaming_config.yaml with your Kafka bootstrap servers,
schema registry URL, and checkpoint locations:
... continues with setup instructions, usage examples, and more.