11 Apr 2026
Nano-vLLM is a lightweight vLLM implementation built from scratch in ~1,200 lines of Python. It achieves comparable inference speed to vLLM while remaining readable end-to-end. This post walks through the architecture, answers common questions about how the key systems work, and profiles the engine on Qwen3-4B to see where the...
25 Mar 2026
Nanochat is a GPT-2-beating LLM training codebase by Karpathy — tokenizer, pretraining, SFT, RL, and eval in readable single-file scripts. No frameworks, no abstractions you can’t trace end-to-end. The speedrun pipeline trains a 24-layer model that beats GPT-2’s CORE score on wall clock, using value embeddings, FP8 training, and a...
22 Sep 2018
In this blog, we will look at several practical large scale distributed systems built esp for internet applications. Facebook Graph Search MemCache TAO Graph Service Unicorn social graph search Google Tail at Scale Bigtable GFS NoSQL Facebook Graph Search R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H....
15 Sep 2018
In this blog, we will look at several practical machine learning algorithms and their industrial appliations. SVM XGBoost Conditional Random Fields Neural Collaborative Filtering Yahoo! Learning to Rank Metrics Google Ads CTR Bing Sponsored Search CTR Facebook Ads CTR Google Play Recommender: Wide and Deep Didi ETA: Wide, Deep and...
08 Sep 2018
Autonomous driving uses sensors to perceive the world around it. This blog considers two papers for 3d object detections using either Lidar or camera images. Lidar has depth information, but it is sparse. This makes 3d convolution inefficient. Image has dense semantics information, but it has occlusion issues, and depth...