FIG. 13 · ATLASDBSCAN
Clusters without
committing to k.
Density-based clustering. Points in dense neighborhoods form clusters. Sparse points get labeled noise — and that's a useful signal, not a failure mode. No need to pick k. Two parameters take its place.
Move eps to change the neighborhood radius. Move minPts to change how many points constitute a "dense" region. Faded gray dots are noise — points that didn't make it into any cluster. Try the rings preset to feel where K-Means stalls and DBSCAN does the obvious thing.
§ IDensity, drawn live
Points within eps of a core point join that cluster. Points with too few neighbors are noise.
§ IIHow it works
Pick a point. Find its neighbors within distance eps. If there are at least minPts of them, this is a "core" point and it starts a cluster. Recursively grow the cluster by including any other point within eps of any point already in the cluster. Repeat for every unvisited point. Anything left over with too few neighbors is noise.
The "core" definition is what makes DBSCAN density-based. K-Means cares about distance to a centroid; DBSCAN cares about whether enough nearby points exist to call this region "dense." That subtle reframing is what lets it find arbitrary-shaped clusters and explicitly model outliers.
The cost: two parameters that interact in non-obvious ways. The epsilon-vs-minPts grid search is the practical equivalent of picking k for K-Means — same kind of guesswork, different shape.
The math
For a point p, define its eps-neighborhood:
N_ε(p) = { q : ‖p − q‖ ≤ ε }p is a core point iff |N_ε(p)| ≥ minPts. Two core points are density-reachable if a chain of core points connects them via overlapping eps-neighborhoods. A cluster is a maximal set of density-reachable points plus all their (non-core) eps-neighbors. Anything else is noise.
The algorithm runs in O(n²) with brute-force neighbor search, O(n log n) with a spatial index.
§ IIIWhere it shines, where it breaks
Non-convex clusters
Try the rings preset. K-Means slices the plane with straight lines and gets the wrong answer. DBSCAN follows density and finds the two rings as separate clusters. Anything spatial that isn't roughly globular is DBSCAN's territory.
Noise as signal
Outlier detection comes free. The points labeled noise are points DBSCAN couldn't fit into any dense region — frequently exactly the points you want to flag. Fraud detection, sensor anomaly, industrial QC.
Variable-density clusters
One eps for the whole dataset. If you have a dense cluster nested inside a sparse one, eps either splits the dense cluster or merges everything together. HDBSCAN was specifically designed to handle this.
High dimensions
Distance metrics degrade as dimensions climb. By 50 dimensions, the difference between "near" and "far" collapses, and DBSCAN's eps neighborhood becomes meaningless. Dimensionality-reduce first (UMAP, PCA) or use a model that doesn't depend on geometry.
§ IVTrade-off scorecard
- Inference0.60
- Accuracy0.70
- Training0.65
- Small size0.80
§ VIn production
Astronomical surveys at LSST and the Sloan Digital Sky Survey. DBSCAN is the workhorse for finding stellar streams, galaxy clusters, and tidal tails in catalogs of billions of objects. Density — not distance to a hand-picked centroid — is the natural language of sky structure. The "noise" label happens to be where most of the interesting transient phenomena hide.