FIG. 13 · ATLASK-Nearest Neighbors

No training.
Just memory.

There is no model. To predict, find the k closest training points and vote. Trivially simple, often surprisingly competitive — but every prediction scans the dataset, which is the catch.

Click anywhere to add a training point of the active class. The background shading is the predicted class for every region of the canvas. Move the k slider to feel the bias-variance trade-off — k=1 is jagged and overfit, k=15 is smooth and biased.

Filed

Engine

Pure JS · brute-force Euclidean

Source

knn-lab.js ↗

§ IThe boundary is the data

Every grid cell shows the prediction. The decision boundary is wherever neighbor votes flip.

click to add · toggle class above

§ IIHow it works

KNN has no training step. The model is the training set. Predict by computing the distance from the new point to every training point, sorting, taking the top k, and majority-voting their labels. Done.

The "no training" claim is real but lopsided. You skip the training cost and pay it back at prediction time, on every single query. Brute-force KNN scans the entire dataset for each prediction; production KNN uses spatial indexes (KD-trees, ball trees, locality-sensitive hashing) to cut that cost.

The math

For a query point x and training set {(x_i, y_i)}:

N_k(x) = argmin_(I ⊂ {1..n}, |I|=k) Σ_(i ∈ I) ‖x − x_i‖²

The prediction is then mode (or mean, for regression) of {y_i : i ∈ N_k(x)}.

Distance metric matters. Euclidean is the default; Manhattan, cosine, and Mahalanobis cover other geometries. Feature scaling matters more — KNN treats all dimensions equally, so a feature in [0, 100000] dominates one in [0, 1].

§ IIIWhere it shines, where it breaks

Shines

Tiny datasets

With fewer than a few thousand points, KNN's "no model" is a feature: no overfitting risk via misspecified architecture. The data IS the model.

Shines

Local structure

When class membership depends on local neighborhoods rather than global linear or polynomial trends, KNN naturally captures that. Image and signal retrieval lean on it for this reason.

Breaks

High-dimensional spaces

The curse of dimensionality is brutal here. In 100 dimensions, every point is roughly equidistant from every other, and "nearest" loses meaning. PCA or feature selection first.

Breaks

Inference at QPS

Try the sparse preset and crank k to 15. Each prediction sorts the entire dataset. For a million-point production index, you cannot afford that. KD-trees help up to ~30 dimensions, then degrade.

§ IVTrade-off scorecard

Inference0.30
Accuracy0.65
Training1.00
Small size0.20

§ VIn production

Spotify's "More like this" similarity search. Approximate KNN over learned embeddings — not raw audio — powers the "fans also like" rails on every artist page. The embeddings are learned by neural networks; the retrieval is brute KNN with HNSW indexing, because nothing else matches its semantic flexibility at that scale.

§ VICompare to

Decision Tree

Faster inference · interpretable

SVM

Margins, not neighborhoods

Logistic Regression

Calibrated · fast at inference

Try the wizard again →