Case studies and shipped work in LLMs, RAG, and production ML.
Reduced retrieval latency by 60% and cost by 40% for a 10M-document knowledge base using hybrid search and custom reranking.
Shipped an on-device LLM experience for a productivity app. Optimized for latency and battery; 2M+ monthly active users.
Built an internal platform for tracing, logging, and cost attribution across 50+ model endpoints. Cut debugging time by 70%.