Distributed systems, data pipelines, cloud infrastructure, and open-source developer tools.
VS Code extension to view and explore binary data files directly in the editor. Supports 11 formats including pkl, h5, parquet, feather, joblib, npy, npz, msgpack, arrow, avro, nc, and mat files. Implemented a Python backend with isolated virtual environments for safe, on-demand data parsing. Optimized file loading to handle large datasets without editor freezes.
Production-ready Terraform template supporting dev, staging, and prod environments. Modular IaC architecture with reusable components for VPC, ECS, RDS, ALB, ECR, Route53, and remote state management. Implements multi-environment patterns using for_each loops and environment conditionals.
Distributed parallelization engine using Docker, Celery, and RabbitMQ for scalable task execution. Enables dynamic worker scaling across multiple nodes for compute-intensive workloads. Focused on fault tolerance, task retries, and throughput optimization for real-world data pipelines.
End-to-end ETL pipeline that processes traffic accident data to identify patterns and insights. Built with Apache Airflow for orchestration and Spark for large-scale data processing. Includes data visualization dashboards for exploring collision trends.
Multi-node Apache Airflow cluster with distributed schedulers, metadata DB replication using Patroni, self-healing capabilities, and Prometheus-Grafana monitoring. Designed for high availability and fault tolerance. (Not publicly available)
Large-scale archival and deletion pipelines for multi-product Cassandra database. Migrated archived data to Amazon S3 in Hive format, configured AWS Athena reducing query costs by 60%. Ensured data governance compliance throughout the archival process. (Not publicly available)
Highly available APIs, databases, and Airflow services using Keepalived (VIPs), Patroni (PostgreSQL HA), and shared storage via GlusterFS/NFS. Nginx load balancing with Route53 and Azure DNS for global distribution. (Not publicly available)