Writing
Engineering notes, mostly about systems and applied ML. I try to name what broke, not only what worked.
Building a Real-Time Bus Prediction System for Madison Metro Live ML that corrects the transit API's ETAs with a 47-feature XGBoost model and Mondrian conformal prediction, retrained nightly behind a hard deploy gate. Deploying RAG in AWS Bedrock: Benchmarking 9 LLMs on the WattBot Challenge Ensemble majority voting beat every individual model. The highest-citation model finished last. A serverless RAG pipeline on Bedrock with full cost tracking. Building a Speculative Decoding Engine from Scratch Custom Triton kernels, tree-structured attention, four bugs, and an honest negative result: 0.66x baseline. The full arc from 0.08x to 0.66x and what it taught me.