FIG. 13 · ATLASGradient Boosting

Trees, but each
one fixes the last.

Add trees one at a time, each fitted to the residual error of the ensemble so far. XGBoost and LightGBM dominate Kaggle leaderboards for a reason: a sequence of weak learners, taken together, is rarely beaten on tabular data.

Move the iteration slider to watch the boundary form. Each step adds a small decision tree that fits the residuals of the cumulative prediction. Lower learning rate means slower but smoother learning. Depth controls how flexible each individual tree is.

Filed

Engine

Pure JS · squared-error gradient boosting

Source

gradient-boosting-lab.js ↗

§ IThe ensemble, growing

Move iterations from 1 to 100. Each step adds one tree fitted to current residuals.

cumulative score shaded by confidence

§ IIHow it works

Start with a constant prediction (zero, or the class mean). Compute the residual at each training point: how far off is the current prediction? Fit a small decision tree to predict those residuals. Add it to the ensemble, scaled by the learning rate. Repeat.

The trick is that each tree fixes the previous ensemble's mistakes. Random Forest averages independent trees. Gradient Boosting chains them. The dependency means boosting trains slower (you can't parallelize across trees the way you can with a forest) but the sequential corrections produce tighter fits with the same total tree budget.

Modern implementations — XGBoost, LightGBM, CatBoost — add regularization, smarter split-finding, histogram-based binning, and second-order gradient information. The algorithm in this demo is the basic version; production libraries are several generations more sophisticated.

The math

For squared-error loss, given predictions F_m(x) after m iterations:

r_i^(m+1) = y_i − F_m(x_i)

Fit a regression tree h_(m+1) to the residuals, then update:

F_(m+1)(x) = F_m(x) + ν · h_(m+1)(x)

where ν is the learning rate. After M iterations, the prediction is the cumulative sum of all tree outputs scaled by ν. For classification, replace squared error with log-loss and the residuals become probability gradients.

§ IIIWhere it shines, where it breaks

Shines

Tabular accuracy ceiling

For mixed-type tabular data — the realm of finance, e-commerce, retention — gradient boosting is roughly the ceiling of what's achievable without deep learning. Often beats neural networks at the same problem.

Shines

Robust to dirty data

Trees handle missing values, mixed scales, irrelevant features, monotonic constraints. XGBoost in particular has direct support for missing-value branches and feature interaction constraints.

Breaks

Sequential training

Trees can't be built in parallel because each depends on the residuals from the last. LightGBM mitigates this with histogram binning, but the fundamental dependency is real. Random forests are dramatically faster to train.

Breaks

Overfit risk on noisy labels

Try the noisy preset above and crank n_estimators to 100. The boundary memorizes the noise. Cross-validation, early stopping, and regularization (γ, λ in XGBoost) are not optional in production.

§ IVTrade-off scorecard

Inference0.55
Accuracy0.90
Training0.45
Small size0.40

§ VIn production

Airbnb's price recommendations. A multi-stage gradient-boosted system predicts host-market-pricing suggestions across millions of listings worldwide. The model consumes hundreds of features — seasonality, neighborhood, photo quality scores, local events — and outputs a single nightly rate. XGBoost was the first algorithm to beat Airbnb's hand-tuned pricing heuristics, and it's been the workhorse since.

§ VICompare to

Random Forest

Parallel trees · faster training

Decision Tree

Single tree · interpretable

Neural Network (MLP)

Deep learning · phase 3

Try the wizard again →