04 projects · pipelines, lakes, warehouses

Data engineering

Streaming platforms, cloud-native lakes, and dimensional warehouses — the systems that move and shape data before anyone sees a dashboard.

Image slot

Add pipeline diagram

MSSQL → Debezium → Kafka → Flink (scoring) → Airflow → Snowflake (dbt star schema). Show CDC + streaming ML.

Recommended: 1600 × 900 · PNG

Streaming · CDC

Real-time fraud detection platform

Repository ↗

A production-style platform for scoring financial transactions for fraud in real time, built around change-data-capture streaming and dimensional modeling.

  • CDC pipeline (Debezium + Kafka) ingesting transactional data from MSSQL
  • Real-time enrichment, feature engineering, and ML-based fraud scoring with Apache Flink
  • Batch and streaming ELT orchestrated with Airflow, loading into Snowflake
  • Dimensional modeling with dbt following Kimball methodology
PythonKafkaDebeziumApache FlinkAirflowdbtSnowflakeDockerS3

Image slot

Add pipeline diagram

Lambda architecture: yfinance batch + Kafka stream → HDFS medallion (raw/staging/curated) → Spark → Hive → Power BI. Airflow orchestration.

Recommended: 1600 × 900 · PNG

Big data · Lambda

EGX big data pipeline

Repository ↗

A production-grade Lambda-architecture pipeline for Egyptian Exchange market data, processing daily OHLCV batches alongside real-time tick streams.

  • Daily OHLCV batches and real-time tick data ingested via Kafka
  • Medallion lake on HDFS (raw → staging → curated) with PySpark computing SMA, EMA, RSI, MACD, Bollinger Bands
  • Four Airflow DAGs orchestrating batch ingestion, Spark ETL, Hive view refresh, and streaming
  • Live Power BI reporting via Simba Hive ODBC
PythonPySparkKafkaHDFSHiveAirflowDockeryfinancePower BI

Image slot

Add pipeline diagram

GCP architecture: scrapers → Pub/Sub → Dataflow (dedup, DLQ) → GCS lake → BigQuery warehouse → FastAPI + Firestore. Include analytics layer.

Recommended: 1600 × 900 · PNG

Cloud · Serverless

GCP e-commerce data platform

Repository ↗

A serverless, end-to-end platform on Google Cloud that ingests multi-site e-commerce data for price comparison and seller analytics.

  • Streaming ingestion with exact and fuzzy deduplication, dead-letter queues, and incremental loads
  • GCS data lake feeding a BigQuery warehouse, with FastAPI and Firestore for real-time access
  • Analytics layer for price tracking, seller trust scoring, sentiment analysis, and automated alerts
PythonCloud RunPub/SubDataflowBigQueryCloud StorageFirestoreSQL

Image slot

Add pipeline diagram

SSIS ETL from OLTP sources → staging → SQL Server star schema (fact + dim tables) with SCD2 highlighted.

Recommended: 1600 × 900 · PNG

Warehouse · SSIS

E-commerce data warehouse

Repository ↗

An end-to-end dimensional data warehouse for retail analytics, built for accurate historical reporting.

  • Star-schema modeling for analytical efficiency
  • ETL pipelines built with SSIS to extract, transform, and load operational data
  • SCD Type 2 implementation for tracking historical changes accurately
SQL ServerT-SQLSSISStar schemaETL