Vol. XII · No. 05 · May 2026
Jake Cuth.
← from the Model Atlas

Clusters without
committing to k.

Density-based clustering. Points in dense neighborhoods form clusters. Sparse points get labeled noise — and that's a useful signal, not a failure mode. No need to pick k. Two parameters take its place.

Move eps to change the neighborhood radius. Move minPts to change how many points constitute a "dense" region. Faded gray dots are noise — points that didn't make it into any cluster. Try the rings preset to feel where K-Means stalls and DBSCAN does the obvious thing.


Points within eps of a core point join that cluster. Points with too few neighbors are noise.

faded dots are noise

Pick a point. Find its neighbors within distance eps. If there are at least minPts of them, this is a "core" point and it starts a cluster. Recursively grow the cluster by including any other point within eps of any point already in the cluster. Repeat for every unvisited point. Anything left over with too few neighbors is noise.

The "core" definition is what makes DBSCAN density-based. K-Means cares about distance to a centroid; DBSCAN cares about whether enough nearby points exist to call this region "dense." That subtle reframing is what lets it find arbitrary-shaped clusters and explicitly model outliers.

The cost: two parameters that interact in non-obvious ways. The epsilon-vs-minPts grid search is the practical equivalent of picking k for K-Means — same kind of guesswork, different shape.

The math

For a point p, define its eps-neighborhood:

N_ε(p) = { q : ‖p − q‖ ≤ ε }

p is a core point iff |N_ε(p)| ≥ minPts. Two core points are density-reachable if a chain of core points connects them via overlapping eps-neighborhoods. A cluster is a maximal set of density-reachable points plus all their (non-core) eps-neighbors. Anything else is noise.

The algorithm runs in O(n²) with brute-force neighbor search, O(n log n) with a spatial index.


Shines

Non-convex clusters

Try the rings preset. K-Means slices the plane with straight lines and gets the wrong answer. DBSCAN follows density and finds the two rings as separate clusters. Anything spatial that isn't roughly globular is DBSCAN's territory.

Shines

Noise as signal

Outlier detection comes free. The points labeled noise are points DBSCAN couldn't fit into any dense region — frequently exactly the points you want to flag. Fraud detection, sensor anomaly, industrial QC.

Breaks

Variable-density clusters

One eps for the whole dataset. If you have a dense cluster nested inside a sparse one, eps either splits the dense cluster or merges everything together. HDBSCAN was specifically designed to handle this.

Breaks

High dimensions

Distance metrics degrade as dimensions climb. By 50 dimensions, the difference between "near" and "far" collapses, and DBSCAN's eps neighborhood becomes meaningless. Dimensionality-reduce first (UMAP, PCA) or use a model that doesn't depend on geometry.


INFERENCE ACCURACY TRAINING SIZE
  • Inference0.60
  • Accuracy0.70
  • Training0.65
  • Small size0.80

Astronomical surveys at LSST and the Sloan Digital Sky Survey. DBSCAN is the workhorse for finding stellar streams, galaxy clusters, and tidal tails in catalogs of billions of objects. Density — not distance to a hand-picked centroid — is the natural language of sky structure. The "noise" label happens to be where most of the interesting transient phenomena hide.


Try the wizard again →