Vol. XII · No. 05 · May 2026
Jake Cuth.
← from the Model Atlas

A committee of
decision trees.

Hundreds of trees trained on bootstrapped samples and random feature subsets, averaged into a single prediction. Famously hard to beat on tabular data, and forgiving of every kind of bad data hygiene.

The boundary you see at the top is the majority vote across N trees. Underneath, the first five trees in the forest are drawn individually so you can see how each one is wrong in its own way — and how averaging that variety of wrongness produces a smooth right.


Crank n_trees from 1 to 100. The boundary smooths from jagged to confident. Push max_depth to 10 and watch each individual tree become eager to memorize — the forest still smooths it out.

averaged probability surface

First five trees in the forest

Each of these is one tree from the ensemble, trained on a different bootstrap sample with a different random feature picked at each split. No tree is great. The vote across all of them is.


Train each tree on a bootstrap sample — a random selection of training rows drawn with replacement, so some rows appear twice and some not at all. At every split, restrict the candidate features to a random subset (this is the "random" in random forest, beyond bootstrapping). The two sources of randomness mean the trees disagree, and disagreement is the engine of the ensemble.

For prediction, run a new point down every tree and take the majority vote. The probability surface you see in the demo is the fraction of trees that voted for class 1 at each grid cell. Where the trees agree the surface is dark; where they're undecided the shading is faint — that's the model's honest uncertainty.

The math

For T trees voting on a point x:

f̂(x) = mode( T₁(x), T₂(x), …, T_T(x) )

For probabilities (when needed):

P̂(y=1 | x) = (1/T) Σ T_i(x)

Each tree T_i is trained on a bootstrap sample D_i ∼ D with feature bagging. The out-of-bag (OOB) points — the rows not picked for D_i — give an honest validation estimate at zero extra cost.


Shines

Tabular accuracy, almost free

Default settings on a clean tabular dataset routinely produce a baseline that takes weeks of work to beat. Feature engineering helps, hyperparameter tuning helps, but the floor is already high.

Shines

Robust to messy data

Outliers, mixed scales, irrelevant features, missing values (with simple imputation) — a random forest mostly shrugs. It is the model that punishes preprocessing fragility least.

Breaks

Inference cost at scale

100 trees, 10 levels deep each, run for every prediction. That's fine for batch scoring. For real-time inference at high QPS, consider gradient boosting on a single sequence or distill the forest into a smaller model.

Breaks

Lost interpretability

A single tree is a flowchart. A hundred trees averaged is a weather pattern. Feature importance metrics (Gini, permutation) give a directional read but not the per-prediction explanation a regulator can follow.


Directional, not exact. Inference cost varies with tree count and depth.

INFERENCE ACCURACY TRAINING SIZE
  • Inference0.55
  • Accuracy0.85
  • Training0.55
  • Small size0.30

Microsoft Kinect's body-pose tracking. The original Xbox 360 Kinect classified each pixel of an infrared depth image into one of 31 body parts using a random forest of three trees with 20 levels each — trained on a million synthetic poses, evaluated at 200 FPS on a console GPU. Tabular accuracy showing up in real-time computer vision.


Try the wizard again →