Projects
A selection of what I've designed, shipped and operated. Details are kept at the level of my public CV — the interesting parts are the architectures and the trade-offs.
LLM-as-a-Service · Flagship
The centrepiece of my work: open-source LLMs, embedding and reranker models served across multi-server GPU infrastructure. vLLM inference sits behind a LiteLLM proxy exposing OpenAI-compatible APIs, consumed by multiple teams through standardised FastAPI services — OCR, translation, transcription and image analytics as a catalogue.
Security-hardened deployments, Prometheus/Grafana observability for health, latency and usage, and the SOPs and runbooks that make it operable by more than one person. Built to be boring in the best way: dependable, monitored, documented.
Retrieval
Secure, domain-specific RAG platform: private vector stores, BGE-M3 embeddings, chunking and reranking via LlamaIndex, integrated into analyst workflows under strict data-sensitivity constraints.
Pipelines
Apache NiFi pipeline chaining email ingestion → audio transcription → automated routing to operational teams; part of a broader multi-stage pipeline covering transcription, image and video analysis, NER, classification, summarisation and hybrid search.
Agentic AI
Agent-based log analysis and diagnostic assistant, plus MCP-based tool-calling architectures connecting LLMs to internal data sources and operational systems.
Fine-tuning
Translation model fine-tuned to an organisational style guide and glossary, with an automated retraining cycle and MLflow-tracked evaluation establishing reproducible baselines.
Evaluation
Formal benchmarking of open-source LLMs across GPU on latency, quality, cost and operational constraints — reports that directly guided architecture and model-selection decisions.
Years ago I ran a small blog called Data Science for Padawans. This is its successor — practical notes on building AI platforms. First posts coming soon.
Draft
Golden test sets, retrieval vs. generation metrics, and regression suites that catch quality drift before users do.
Draft
KG-RAG is worth the complexity only when the value lives in relationships between facts — a decision framework.
Draft
vLLM + LiteLLM in production: hardening, monitoring, and the boring reliability work that makes AI adoptable.