ML/AI Engineer - Ruizhang Zhou

Multi-tenant AI Platform (Production)

KIconnect - RWTH IT Center | Full-time | Sep 2024 - Present

Tech Stack

C#/.NET 9 ASP.NET Core Semantic Kernel Azure AI Inference OpenAI .NET Microsoft.ML.Tokenizers pythonnet Vue 3 MongoDB CSFLE SignalR Nomad

Challenge

Build a production-grade, multi-tenant LLM assistant platform that handles streaming responses, accurate token/cost tracking across diverse models, secure document handling, and enterprise SSO integration while maintaining EU data compliance.

Solution

Unified Tokenization: Integrated tokenization for both text and vision across OpenAI, Llama 3, Gemma 3, Mistral, Qwen, and DeepSeek models using Microsoft.ML.Tokenizers and custom Python bridges via pythonnet
Real-time Streaming: Implemented streaming chat with SignalR for real-time token-by-token responses with precise cost tracking
Admin Dashboard: Built comprehensive Vue 3 + Inertia admin UI for model/deployment management, user quotas, and reasoning effort configuration
Security & Compliance: Integrated Shibboleth SSO, MongoDB Client-Side Field Level Encryption (CSFLE), EU data region compliance
Reliability: Implemented retry policies with Polly, circuit breakers, and graceful degradation

Business Value

          ✓ Enabled enterprise-wide AI adoption with centralized governance

          ✓ Accurate cost allocation across departments through precise token tracking

          ✓ 99.9% uptime with robust error handling and failover mechanisms

          ✓ Reduced AI infrastructure complexity for end users with single-sign-on

Links

KIconnect Platform (Internal)

Time Series Generative Models for Traffic Scenario Generation

RWTH Cyber-Physical Mobility Group | Master Thesis | Oct 2023 - Apr 2024

Tech Stack

Python PyTorch TimeGAN Diffusion-TS NVIDIA H100 CUDA NumPy Pandas

Challenge

Implement and train time series generative models to synthesize realistic vehicle trajectory data for autonomous vehicle testing. Need to handle high-dimensional multivariate time series, train complex models on GPU infrastructure, and evaluate generation quality.

Solution

Model Implementation: Implemented TimeGAN (GAN-based) and Diffusion-TS (diffusion-based) architectures in PyTorch for multivariate time series generation
Data Pipeline: Processed IKA real-world driving datasets (inD, rounD, exiD) with sliding window extraction, normalization, and data augmentation
GPU Training: Trained models on NVIDIA H100 and Quadro RTX 6000 GPUs with hyperparameter optimization, learning rate scheduling, and early stopping
Model Evaluation: Implemented evaluation pipeline using PCA, t-SNE visualization, and statistical metrics to assess trajectory realism and diversity
Integration: Built conversion pipeline to XML format for CPM Remote platform, enabling generated scenarios to be used in autonomous vehicle simulation
Production Deployment: Packaged models for inference, implemented batch generation, and integrated with existing testing infrastructure

Business Value

          ✓ Automated generation of 1000+ diverse test scenarios from real-world data

          ✓ Reduced manual scenario design effort by 80% while improving coverage

          ✓ Discovered 50+ edge cases not covered by traditional rule-based methods

          ✓ Enabled continuous testing pipeline for autonomous vehicle motion planners

Links

GitHub Repository | Thesis PDF

LLM & GNN Research for Medical Image Analysis

RWTH Chair of DBIS | Research Assistant | Jul 2023 - Mar 2024 (9 months)

Tech Stack

Python PyTorch Transformers LLaMA Ollama BLIP PyG CUDA GPU Cluster

Challenge

Generate accurate and clinically relevant radiology reports from chest X-ray images by combining vision and language models. Traditional methods struggle with medical terminology and require extensive manual annotation.

Solution

Developed a pipeline combining BLIP (vision-language model) with LLMs for chest X-ray report generation
LLM Deployment & Serving: Deployed LLaMA models on GPU cluster using Ollama, built REST API service with FastAPI for internal research access, managed model inference and user requests
Conducted research on integrating Knowledge Graphs with GNNs to improve medical domain understanding
Ran large-scale experiments on GPU cluster infrastructure for model training and evaluation, experimented with vLLM for high-throughput inference
Built end-to-end LLM chat platform: GPU deployment → API service → web interface for research demonstrations

Business Value

          ✓ Research paper in preparation on LLM-based medical report generation

          ✓ Demonstrated feasibility of automated radiology report generation

          ✓ Contributed to knowledge graph integration techniques for medical AI

Semantic Kernel with Local LLMs

Personal Project | Apr 2024

Tech Stack

C# Semantic Kernel Local LLMs .NET

Challenge

Integrate locally-hosted LLMs with Microsoft Semantic Kernel to enable AI orchestration without cloud dependencies, useful for privacy-sensitive applications.

Solution

Developed connectors to integrate local LLM endpoints with Semantic Kernel
Implemented function calling and semantic memory for local models
Demonstrated how to build AI agents using on-premise infrastructure

Links

GitHub Repository