Traffic Trajectory Data Pipeline for ML Models
Build comprehensive data preprocessing pipeline to transform raw IKA traffic datasets (inD, rounD, exiD) containing thousands of vehicle trajectories into clean, normalized time series data suitable for training generative ML models. Must handle missing data, variable-length sequences, multi-agent interactions, and convert outputs to simulation-ready XML format.
- Data Extraction & Cleaning: Processed 3 large-scale IKA datasets (intersection, roundabout, expressway scenarios) with custom parsers for trajectory data, handling edge cases and data quality issues
- Feature Engineering: Engineered features including velocity, acceleration, heading, relative positions, and interaction features between multiple vehicles
- Sliding Window Segmentation: Implemented 500-frame sliding window approach to extract fixed-length trajectory sequences, creating thousands of training samples from raw data
- Normalization Pipeline: Built standardization and min-max normalization pipelines to ensure stable training, with separate statistics per dataset to preserve traffic pattern characteristics
- Data Augmentation: Implemented data augmentation strategies including temporal shifting and noise injection to increase training data diversity
- XML Conversion Pipeline: Developed automated pipeline to convert generated trajectories back to XML format for CPM Remote platform integration, handling coordinate transformations and metadata
- Data Validation: Built validation checks for trajectory continuity, physical constraints (velocity/acceleration limits), and format compliance
✓ Automated data pipeline reduced preprocessing time from weeks to hours
✓ Enabled training of generative models on diverse traffic scenarios
✓ Seamless integration with simulation platform through XML conversion
Production LLM Token & Cost Tracking System
Build accurate token counting and cost tracking infrastructure for a multi-tenant LLM platform supporting diverse models (OpenAI, Llama, Gemma, Mistral, Qwen, DeepSeek) with different tokenization methods. Must handle both text and vision inputs, aggregate costs per tenant, and provide real-time billing data.
- Unified Tokenization Layer: Integrated Microsoft.ML.Tokenizers for text and pythonnet for vision model tokenizers, creating abstraction layer supporting all model families
- Real-time Cost Calculation: Implemented streaming token counting during LLM responses with accurate pricing models per deployment
- Data Aggregation Pipeline: Built Hangfire background jobs for hourly/daily cost aggregation, quota monitoring, and usage analytics
- Multi-tenant Analytics: Designed MongoDB schemas for efficient querying of usage patterns, cost breakdowns, and trend analysis per tenant
- Quota Management: Implemented real-time quota enforcement with overflow handling and alert system
✓ Prevented budget overruns with real-time quota enforcement
✓ Enabled data-driven decisions on model deployment and resource allocation
✓ Generated detailed usage reports for financial planning
Quantitative Investment Strategies
Collection of quantitative investment strategies including Market Health Indicator (MHI) and other algorithmic trading approaches. Implements data pipelines for fetching market data, computing technical indicators, backtesting strategies, and visualizing results.
- Market Health Indicator (MHI): Custom composite indicator combining multiple market metrics to assess overall market conditions
- Data Pipeline: Automated fetching and preprocessing of historical market data from various sources
- Backtesting Framework: Simulates strategy performance on historical data with transaction costs and slippage
- Performance Analytics: Calculates Sharpe ratio, maximum drawdown, alpha/beta, and other risk-adjusted returns
- Visualization: Generates equity curves, drawdown charts, and strategy performance dashboards
Portfolio Monitoring & Alert Systems
Automated monitoring system that tracks portfolio drawdowns and triggers alerts when losses exceed predefined thresholds. Helps manage risk by providing early warnings of significant portfolio declines.
Continuous monitoring and alerting system for tracking key metrics and anomalies. Implements data quality checks, drift detection, and automated notifications.
Alerts are delivered via my self-hosted Matrix server (rickandzoey.com) using a small AlertBot — daily digest and threshold crossovers only, to keep signal high and noise low.
Business Process Intelligence
Course focused on extracting insights from business process data using process mining techniques, data analytics, and business intelligence tools.
- Process discovery from event logs
- Conformance checking and process enhancement
- Performance analysis and bottleneck detection
- Business process optimization through data-driven insights
Introduction to Data Science
Comprehensive course covering data science fundamentals including data wrangling, exploratory data analysis, statistical modeling, and machine learning pipelines.
- Data cleaning and preprocessing with Pandas
- Exploratory Data Analysis (EDA) and visualization
- Statistical inference and hypothesis testing
- Supervised and unsupervised learning algorithms
- Feature engineering and model evaluation
- End-to-end ML pipeline development
Deep Learning Packages - Data Science Assignments
Series of data science assignments focusing on applying deep learning frameworks to real-world datasets. Covered data preprocessing, model training, hyperparameter tuning, and results analysis.
Operations Research & Optimization
Advanced operations research courses covering linear programming, network flows, integer programming, dynamic programming, and queueing theory. Applied mathematical optimization to real-world business problems.
Hands-on course using algebraic modeling languages (AML) to formulate and solve large-scale optimization problems. Implemented solutions for supply chain, scheduling, and resource allocation problems.
- Linear and integer programming formulations
- Sensitivity analysis and duality theory
- Network optimization and shortest path algorithms
- Transportation and assignment problems
- Production planning and scheduling
- Modeling with AMPL, GAMS, or similar languages
FinTech Data Analytics Platform
Full-stack web application for banking with integrated data analytics features. Built ETL pipelines for transaction data, implemented analytical dashboards, and provided business intelligence insights.
- Data Pipeline: Designed MongoDB schemas for efficient storage and querying of transaction data
- Analytics Engine: Implemented aggregation pipelines for computing financial metrics, trends, and KPIs
- Dashboard: Built interactive visualizations for transaction patterns, customer segments, and financial health indicators
- Reporting: Automated generation of periodic reports with key business metrics