Silhouette scores at k=2 are almost always the highest, and almost always meaningless. The segmentation lab on this site used to show k=2 as the “winner” on most slices because the chart faithfully reported the score. The chart has been updated. The data has not.
The math behind it is unforgiving. Silhouette is bounded above by 1, with one term measuring intra-cluster cohesion and another measuring nearest-other-cluster separation. With only two centroids, every point has exactly one “other cluster,” and that cluster is almost surely far away if you split the data on its strongest principal direction. So k=2 wins by structural accident. Increase k and you start putting clusters near each other, separation drops, score drops, even when k=4 is the actual answer.
What this means in practice is that silhouette is a relative metric you read past k=2, not a global optimum you read at the maximum. Davies-Bouldin behaves the same way. Calinski-Harabasz is slightly more honest but has its own monotonicity issues at small k. The cleanest answer is to plot the curve, ignore everything below your minimum-viable k, and look at where the curve flattens rather than where it peaks.
The lab now shades the k=2 region as degenerate and labels it as such. The dots are still drawn, with reduced opacity, so a careful reader sees that the score there is real but disqualified. The minimum-viable k is configurable per dataset; on the toy panels it is 3, on most real customer panels it is 4 or 5.
The fix is one extra parameter on the methodology contract and
fifteen lines of SVG. The full diff is in
notebooks/segmentation_model.py and
assets/js/segmentation-lab.js. The right call is to
keep the chart technically correct and add the annotation that
protects the reader from the easy misread.
Charts that show the truth and let the user draw the wrong conclusion aren’t honest. They’re just compliant.