Data Engineer

Data pipelines, quantitative strategies, business intelligence, and analytics systems

Traffic Trajectory Data Pipeline for ML Models

RWTH Cyber-Physical Mobility Group | Master Thesis | Oct 2023 - Apr 2024

Tech Stack
Python Pandas NumPy Data Preprocessing Time Series XML Processing
Challenge

Build comprehensive data preprocessing pipeline to transform raw IKA traffic datasets (inD, rounD, exiD) containing thousands of vehicle trajectories into clean, normalized time series data suitable for training generative ML models. Must handle missing data, variable-length sequences, multi-agent interactions, and convert outputs to simulation-ready XML format.

Solution
Business Value
✓ Processed 10,000+ real-world vehicle trajectories into ML-ready format
✓ Automated data pipeline reduced preprocessing time from weeks to hours
✓ Enabled training of generative models on diverse traffic scenarios
✓ Seamless integration with simulation platform through XML conversion
Links

GitHub Repository

Production LLM Token & Cost Tracking System

KIconnect - RWTH IT Center | Full-time | Sep 2024 - Present

Tech Stack
C#/.NET MongoDB Microsoft.ML.Tokenizers pythonnet Hangfire LINQ
Challenge

Build accurate token counting and cost tracking infrastructure for a multi-tenant LLM platform supporting diverse models (OpenAI, Llama, Gemma, Mistral, Qwen, DeepSeek) with different tokenization methods. Must handle both text and vision inputs, aggregate costs per tenant, and provide real-time billing data.

Solution
Business Value
✓ Accurate cost attribution across 10+ departments with sub-token precision
✓ Prevented budget overruns with real-time quota enforcement
✓ Enabled data-driven decisions on model deployment and resource allocation
✓ Generated detailed usage reports for financial planning

Quantitative Investment Strategies

Personal Project | Python | Sep 2024

Tech Stack
Python Pandas NumPy Matplotlib Financial APIs
Description

Collection of quantitative investment strategies including Market Health Indicator (MHI) and other algorithmic trading approaches. Implements data pipelines for fetching market data, computing technical indicators, backtesting strategies, and visualizing results.

Features
Links

GitHub Repository

Portfolio Monitoring & Alert Systems

Personal Projects | Python | 2024

Tech Stack
Python Pandas Alerts/Notifications Data Monitoring
Drawdown Alerts

Automated monitoring system that tracks portfolio drawdowns and triggers alerts when losses exceed predefined thresholds. Helps manage risk by providing early warnings of significant portfolio declines.

GitHub Repository

Watchtower

Continuous monitoring and alerting system for tracking key metrics and anomalies. Implements data quality checks, drift detection, and automated notifications.

GitHub Repository

Alerts are delivered via my self-hosted Matrix server (rickandzoey.com) using a small AlertBot — daily digest and threshold crossovers only, to keep signal high and noise low.

Business Process Intelligence

RWTH Course | Business Analytics | 2021-2022

Tech Stack
Process Mining Data Analytics Business Intelligence Python
Description

Course focused on extracting insights from business process data using process mining techniques, data analytics, and business intelligence tools.

Key Topics

Introduction to Data Science

RWTH Course | Data Science Fundamentals | 2022-2023

Tech Stack
Python Pandas NumPy Scikit-learn Matplotlib Jupyter
Description

Comprehensive course covering data science fundamentals including data wrangling, exploratory data analysis, statistical modeling, and machine learning pipelines.

Key Topics

Deep Learning Packages - Data Science Assignments

RWTH Assignment | Data Science | 2023

Tech Stack
Python Pandas NumPy PyTorch TensorFlow Jupyter Notebook
Description

Series of data science assignments focusing on applying deep learning frameworks to real-world datasets. Covered data preprocessing, model training, hyperparameter tuning, and results analysis.

Links

GitHub Repository

Operations Research & Optimization

RWTH Courses | Business Administration | 2022-2023

Tech Stack
Python Modeling Languages Optimization Solvers Linear Programming Mixed-Integer Programming
Operations Research 1 & 2

Advanced operations research courses covering linear programming, network flows, integer programming, dynamic programming, and queueing theory. Applied mathematical optimization to real-world business problems.

Practical Optimization with Modeling Languages

Hands-on course using algebraic modeling languages (AML) to formulate and solve large-scale optimization problems. Implemented solutions for supply chain, scheduling, and resource allocation problems.

Key Topics

FinTech Data Analytics Platform

RWTH Software Project | Full-stack + Analytics | 2020-2021

Tech Stack
MongoDB Node.js Angular TypeScript Data Visualization
Description

Full-stack web application for banking with integrated data analytics features. Built ETL pipelines for transaction data, implemented analytical dashboards, and provided business intelligence insights.

Data Engineering Components
Links

GitHub Repository