Dong Wang A machine learning blog

Overview of RL for LLMs: Algorithms and Scaling

This post is a working tour of reinforcement learning for LLMs — the algorithms and the systems that run them at scale. Parts 1–3 cover the math and reference implementation of DPO/PPO/GRPO using TRL as the readable baseline. Parts 4–5 climb up to research-grade variants (REINFORCE++, RLOO, Dr. GRPO, multi-turn...

Understanding LLM Inference Through Nano-vLLM

Nano-vLLM is a lightweight vLLM implementation built from scratch in ~1,200 lines of Python. It achieves comparable inference speed to vLLM while remaining readable end-to-end. This post walks through the architecture, answers common questions about how the key systems work, and profiles the engine on Qwen3-4B to see where the...

Nanochat: A Deep Dive

Nanochat is a GPT-2-beating LLM training codebase by Karpathy — tokenizer, pretraining, SFT, RL, and eval in readable single-file scripts. No frameworks, no abstractions you can’t trace end-to-end. The speedrun pipeline trains a 24-layer model that beats GPT-2’s CORE score on wall clock, using value embeddings, FP8 training, and a...

Distibuted Systems in Industry

In this blog, we will look at several practical large scale distributed systems built esp for internet applications. Facebook Graph Search MemCache TAO Graph Service Unicorn social graph search Google Tail at Scale Bigtable GFS NoSQL Facebook Graph Search R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H....

Machine Learning for Relevance

In this blog, we will look at several practical machine learning algorithms and their industrial appliations. SVM XGBoost Conditional Random Fields Neural Collaborative Filtering Yahoo! Learning to Rank Metrics Google Ads CTR Bing Sponsored Search CTR Facebook Ads CTR Google Play Recommender: Wide and Deep Didi ETA: Wide, Deep and...