FIG. 13 · ATLAS Decision Tree
A flowchart that
fits the data.
Recursive binary splits on the feature that best separates the target. Reads like a flowchart, handles mixed data without preprocessing, but overfits without depth limits.
Two views of the same model. On the left: the feature space, partitioned into rectangles. On the right: the tree itself, with split conditions inside each interior node and class labels at the leaves. Move the depth slider to feel the trade-off between underfit and overfit.
§ I The tree, drawn twice
Watch how each split slices the feature space with one straight cut at a time. Set depth = 1 for a stump (one cut). Crank to 10 to see the overfit boundary — the tree memorizes individual points.
§ II How it works
Decision trees fit recursively. At each node, the algorithm tries every feature and every possible threshold, picking the split that produces the purest two children. "Purity" here is Gini impurity: the chance a randomly drawn point would be misclassified if it picked the majority label. The lower the better.
Splits are axis-aligned — always either "x < threshold" or "y < threshold," never diagonal. That's why the partition map looks like a quilt of rectangles. It's also why a single tree is interpretable: the path from root to leaf is a list of inequalities you can read aloud.
The math
For a node containing classes with proportions p_k, the Gini impurity is:
G = 1 − Σ p_k²The algorithm picks the (feature, threshold) that minimizes the weighted sum of Gini for the two child nodes:
argmin_(f, t) [ (n_L / n) · G_L + (n_R / n) · G_R ]Recursion stops at max_depth, when a node becomes pure, or when no split lowers impurity. Each leaf's prediction is its majority class.
§ III Where it shines, where it breaks
Mixed-type tabular
Numerical features, categorical features, ordinal features, missing values: a tree handles all of them without one-hot encoding, scaling, or imputation. Real production data is messy. Trees are the model that doesn't mind.
Audit-friendly rules
A regulated lender can't ship a neural network without an interpretability layer. A pruned decision tree IS the interpretation. Every prediction is a path of three to five rules that a compliance officer can read.
Overfitting at depth
Crank the depth slider to 10 above. The boundary becomes a haze of narrow rectangles, each carved out around a single training point. Train accuracy climbs to 100%; test accuracy collapses. This is why Random Forest and Gradient Boosting exist.
Diagonal boundaries
Try the spiral preset. Axis-aligned splits can only approximate a diagonal with stair-steps. The tree gets there with enough depth, but the staircase shape is a sign you're paying for the wrong kind of decision surface.
§ IV Trade-off scorecard
Directional, not exact. Reflects shallow trees with reasonable depth limits.
- Inference0.85
- Accuracy0.70
- Training0.85
- Small size0.80
§ V In production
Credit scoring at FICO and the German credit bureaus. Pruned, hand-audited decision trees underwrite the interpretability requirements that regulated lending demands. The same data pumped through gradient boosting would score better, but a tree's path-to-leaf is the explanation a regulator can read aloud.
§ VI Compare to
Random Forest
Many trees, averaged · better accuracy
Gradient Boosting
Trees, sequentially · phase 2
Logistic Regression
Linear, also interpretable