cal

// distributed systems & high availability architecture.

cal

About

dedicated to designing high-performance, fault-tolerant distributed systems. focusing on scalability, data integrity, and the beauty of minimalist technical architecture. exploring the intersections of data engineering, cloud-native infrastructure, and elegant code.

Experience

S

Stealth

2024 - Present
// Distributed Systems Engineer
  • building high-scale data infrastructure and distributed execution engines.
  • focusing on fault tolerance, automated scaling, and low-latency data processing.

Decentro (YC S20)
YC S20

2024 - 2024
// Data Engineer
  • engineered multi-terabyte data archival pipelines.
  • optimized query performance and reduced cloud infrastructure costs by 60%.

Academic

U

University

2023 - 2025
// MSc, Data Science

Selected Builds

[ distributed systems / infrastructure / tools ]

VeilGuard

Universal code obfuscation tool written in Rust. Features AST-based identifier renaming, control flow flattening with opaque predicates, per-string nonce encryption, and a JIT-decrypting native runtime via AES-256-CTR.

Rust
Python
AST Parsing
AES-256-CTR
HMAC

Data File Viewer

VS Code extension to view and explore binary data files directly in the editor. Supports 11 formats including pkl, h5, parquet, feather, joblib, npy, npz, msgpack, arrow, avro, nc, and mat files. Implemented a Python backend with isolated virtual environments for safe, on-demand data parsing. Optimized file loading to handle large datasets without editor freezes.

TypeScript
Python
VS Code API
Webpack

Tab Relay

Chromium extension for cross-browser tab management. Transfer tab groups via system clipboard while preserving group metadata. Features include merging, splitting, and local persistence of tab groups across window sessions.

TypeScript
Chrome Extensions API
Webpack
JSON Scaling

AWS Terraform Multi-Environment Template

Production-ready Terraform template supporting dev, staging, and prod environments. Modular IaC architecture with reusable components for VPC, ECS, RDS, ALB, ECR, Route53, and remote state management. Implements multi-environment patterns using for_each loops and environment conditionals.

Terraform
AWS
VPC
ECS
RDS
ALB
ECR

Parallelization Engine

Distributed parallelization engine using Docker, Celery, and RabbitMQ for scalable task execution. Enables dynamic worker scaling across multiple nodes for compute-intensive workloads. Focused on fault tolerance, task retries, and throughput optimization for real-world data pipelines.

Python
Celery
RabbitMQ
Docker
Redis

Motor Vehicle Collision Analysis Pipeline

End-to-end ETL pipeline that processes traffic accident data to identify patterns and insights. Built with Apache Airflow for orchestration and Spark for large-scale data processing. Includes data visualization dashboards for exploring collision trends.

Python
Apache Airflow
Spark
Data Visualization

Real Estate Analysis Pipeline

Data pipeline that aggregates property listing data to generate market insights. Uses DBT for data transformation and Snowflake for cloud data warehousing. Implements dimensional modeling for analytics queries.

Python
DBT
Snowflake
Data Modeling

LinkedIn Network Analyzer

Tool that extracts and processes professional network data to uncover industry trends and connection patterns. Built with Selenium for web automation and MongoDB for storing extracted data.

Python
Selenium
BeautifulSoup
MongoDB

X-Purge

Chrome extension for smart X (Twitter) unfollowing. Mimics human behavior with randomized delays and daily caps. Implements advanced relationship, activity, and profile quality filters to fix follow-to-follower ratios without API dependency.

JavaScript
Manifest V3
Chrome Storage API
DOM Manipulation

Multi-Node Airflow Cluster

Multi-node Apache Airflow cluster with distributed schedulers, metadata DB replication using Patroni, self-healing capabilities, and Prometheus-Grafana monitoring. Designed for high availability and fault tolerance. (Not publicly available)

Airflow
PostgreSQL
Patroni
Keepalived
GlusterFS
Prometheus

Data Archival/Deletion Pipeline

Large-scale archival and deletion pipelines for multi-product Cassandra database. Migrated archived data to Amazon S3 in Hive format, configured AWS Athena reducing query costs by 60%. Ensured data governance compliance throughout the archival process. (Not publicly available)

Python
Polars
Airflow
S3
Athena
Cassandra

High Availability Infrastructure

Highly available APIs, databases, and Airflow services using Keepalived (VIPs), Patroni (PostgreSQL HA), and shared storage via GlusterFS/NFS. Nginx load balancing with Route53 and Azure DNS for global distribution. (Not publicly available)

Nginx
Patroni
Keepalived
PostgreSQL
Prometheus
Route53

Stack

distributed systems
high availability
infrastructure as code
data engineering
python
go
rust
aws
gcp
kubernetes
terraform
postgresql
apache airflow

Open for Intel

Reach out for collaboration or systems discussion via [ X ] or [ Email ]