Vol. XII · No. 05 · May 2026
Jake Cuth.
← from the Model Atlas

No training.
Just memory.

There is no model. To predict, find the k closest training points and vote. Trivially simple, often surprisingly competitive — but every prediction scans the dataset, which is the catch.

Click anywhere to add a training point of the active class. The background shading is the predicted class for every region of the canvas. Move the k slider to feel the bias-variance trade-off — k=1 is jagged and overfit, k=15 is smooth and biased.


Every grid cell shows the prediction. The decision boundary is wherever neighbor votes flip.

click to add · toggle class above

KNN has no training step. The model is the training set. Predict by computing the distance from the new point to every training point, sorting, taking the top k, and majority-voting their labels. Done.

The "no training" claim is real but lopsided. You skip the training cost and pay it back at prediction time, on every single query. Brute-force KNN scans the entire dataset for each prediction; production KNN uses spatial indexes (KD-trees, ball trees, locality-sensitive hashing) to cut that cost.

The math

For a query point x and training set {(x_i, y_i)}:

N_k(x) = argmin_(I ⊂ {1..n}, |I|=k) Σ_(i ∈ I) ‖x − x_i‖²

The prediction is then mode (or mean, for regression) of {y_i : i ∈ N_k(x)}.

Distance metric matters. Euclidean is the default; Manhattan, cosine, and Mahalanobis cover other geometries. Feature scaling matters more — KNN treats all dimensions equally, so a feature in [0, 100000] dominates one in [0, 1].


Shines

Tiny datasets

With fewer than a few thousand points, KNN's "no model" is a feature: no overfitting risk via misspecified architecture. The data IS the model.

Shines

Local structure

When class membership depends on local neighborhoods rather than global linear or polynomial trends, KNN naturally captures that. Image and signal retrieval lean on it for this reason.

Breaks

High-dimensional spaces

The curse of dimensionality is brutal here. In 100 dimensions, every point is roughly equidistant from every other, and "nearest" loses meaning. PCA or feature selection first.

Breaks

Inference at QPS

Try the sparse preset and crank k to 15. Each prediction sorts the entire dataset. For a million-point production index, you cannot afford that. KD-trees help up to ~30 dimensions, then degrade.


INFERENCE ACCURACY TRAINING SIZE
  • Inference0.30
  • Accuracy0.65
  • Training1.00
  • Small size0.20

Spotify's "More like this" similarity search. Approximate KNN over learned embeddings — not raw audio — powers the "fans also like" rails on every artist page. The embeddings are learned by neural networks; the retrieval is brute KNN with HNSW indexing, because nothing else matches its semantic flexibility at that scale.


Try the wizard again →