Dong Wang A machine learning blog

Nanochat: A Deep Dive

Nanochat is a GPT-2-beating LLM training codebase by Karpathy — tokenizer, pretraining, SFT, RL, and eval in readable single-file scripts. No frameworks, no abstractions you can’t trace end-to-end. The speedrun pipeline trains a 24-layer model that beats GPT-2’s CORE score on wall clock, using value embeddings, FP8 training, and a...

Distibuted Systems in Industry

In this blog, we will look at several practical large scale distributed systems built esp for internet applications. Facebook Graph Search MemCache TAO Graph Service Unicorn social graph search Google Tail at Scale Bigtable GFS NoSQL Facebook Graph Search R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H....

Machine Learning for Relevance

In this blog, we will look at several practical machine learning algorithms and their industrial appliations. SVM XGBoost Conditional Random Fields Neural Collaborative Filtering Yahoo! Learning to Rank Metrics Google Ads CTR Bing Sponsored Search CTR Facebook Ads CTR Google Play Recommender: Wide and Deep Didi ETA: Wide, Deep and...

3D Object Detection

Autonomous driving uses sensors to perceive the world around it. This blog considers two papers for 3d object detections using either Lidar or camera images. Lidar has depth information, but it is sparse. This makes 3d convolution inefficient. Image has dense semantics information, but it has occlusion issues, and depth...

Behavioral Planning

Planning for self-driving vehicles consists of route planning, behavioral planning and motion planning. Route planning picks sequence of road segments. Behavorial planner generates discrete motion goals (location, speed) adherence to rules of road. It specifies desired lane and speed. One local goal can be driving down this lane reaching location...