Projects

Production systems, not demos.

A selection of what I've designed, shipped and operated. Details are kept at the level of my public CV — the interesting parts are the architectures and the trade-offs.

LLM-as-a-Service · Flagship

Enterprise AI cluster

The centrepiece of my work: open-source LLMs, embedding and reranker models served across multi-server GPU infrastructure. vLLM inference sits behind a LiteLLM proxy exposing OpenAI-compatible APIs, consumed by multiple teams through standardised FastAPI services — OCR, translation, transcription and image analytics as a catalogue.

Security-hardened deployments, Prometheus/Grafana observability for health, latency and usage, and the SOPs and runbooks that make it operable by more than one person. Built to be boring in the best way: dependable, monitored, documented.

vLLMLiteLLMFastAPICUDAPrometheusGrafana

Retrieval

Virtual Assistant — RAG based & knowledge assistant evolved to Graph based assistant

Secure, domain-specific RAG platform: private vector stores, BGE-M3 embeddings, chunking and reranking via LlamaIndex, integrated into analyst workflows under strict data-sensitivity constraints.

LlamaIndexBGE-M3vector storesreranking

Pipelines

End-to-end AI ingestion pipeline

Apache NiFi pipeline chaining email ingestion → audio transcription → automated routing to operational teams; part of a broader multi-stage pipeline covering transcription, image and video analysis, NER, classification, summarisation and hybrid search.

NiFiWhisperhybrid search

Agentic AI

Agentic log analytics

Agent-based log analysis and diagnostic assistant, plus MCP-based tool-calling architectures connecting LLMs to internal data sources and operational systems.

MCPtool callingagents

Fine-tuning

Automated translation fine-tuning cycle

Translation model fine-tuned to an organisational style guide and glossary, with an automated retraining cycle and MLflow-tracked evaluation establishing reproducible baselines.

LoRA / PEFTMLflowMADLAD / TranslateGemma

Evaluation

Open-source LLM benchmarking programme

Formal benchmarking of open-source LLMs across GPU on latency, quality, cost and operational constraints — reports that directly guided architecture and model-selection decisions.

benchmarkingCPU vs GPUmodel selection

Writing

Years ago I ran a small blog called Data Science for Padawans. This is its successor — practical notes on building AI platforms. First posts coming soon.

Draft

Evaluating RAG beyond vibes

Golden test sets, retrieval vs. generation metrics, and regression suites that catch quality drift before users do.

Draft

When a knowledge graph earns its keep

KG-RAG is worth the complexity only when the value lives in relationships between facts — a decision framework.

Draft

Serving LLMs like you mean it

vLLM + LiteLLM in production: hardening, monitoring, and the boring reliability work that makes AI adoptable.