FIG. 13 The Model Atlas
Pick the right model.
With reasons.
Answer five questions about your problem. The atlas routes you to one of thirteen discriminative models with a deep, live, interactive explanation. Every junior data scientist asks "what model should I use" too late. This page inverts that.
The wizard below is opinionated. It is not a substitute for cross- validation, baseline comparisons, or honest evaluation against your actual data. It is a shape-of-the-problem heuristic — the conversation a senior would have with a junior on day one. Read the methodology section at the bottom for what it is and is not.
§ VI Browse the catalog directly
For readers who already know what they're looking for. Every card links to a deep destination page with a live, interactive demo.
§ VII Methodology & Honest Caveats
The wizard is a scoring function over a curated catalog. Question 1 (the task) is a hard filter — a regression won't recommend a classifier. Questions 2 through 5 contribute weighted scores. The top score is the recommendation; ranks 2 and 3 are offered as "honest alternatives." The full rule set lives in model-decision-tree.js — ~150 lines, intentionally readable. Tweak it; the wizard recalculates instantly.
Real model selection is empirical. You build several candidates, cross-validate them on your data, and pick the one that wins on a metric you trust. The atlas is the conversation before that — a structured way to narrow from thirteen candidates to two or three, so the empirical work has fewer directions to spread across. "Best model" is a fiction. The right model is the one that survives evaluation on your data.
Every destination page demonstrates its model on a 2D synthetic dataset. Two dimensions are everything you can see. They are not everything that matters — the curse of dimensionality is real and not visualizable. A model that handles 2D data perfectly may struggle in 200D. Read the per-model "when it fails" sections; they break the demo on purpose so you can feel where the boundaries are.
Inference / accuracy / training / size on every model card are directional, not exact. They reflect typical-case performance on typical-size data. Specific architectures, optimizations, and hardware can move any of these values significantly. Treat them as a map, not a benchmark.
FAQ
What does the Model Atlas do?
It's an expert system that recommends a discriminative machine learning model for your specific problem. Answer five questions (dataset size, target type, feature interactions, interpretability needs, latency tolerance) and the atlas routes you to one of 13 algorithms with reasons for the choice, plus a live, interactive demo of that algorithm running in your browser.
Which 13 ML models does it cover?
Linear regression, ridge and lasso, logistic regression, decision tree, random forest, gradient boosting, support vector machine, multi-layer perceptron, naive Bayes, k-nearest neighbors, k-means, DBSCAN, and isolation forest. Each has its own deep-dive page with hand-built interactive visualizations of how the algorithm trains and predicts.
Why an expert system instead of "just use XGBoost"?
XGBoost is the right answer for many problems, but not all. Linear models are still the right choice when interpretability matters, k-nearest neighbors when the dataset is small enough, isolation forest for anomaly detection, and DBSCAN when cluster count is unknown. The atlas surfaces those distinctions instead of letting them get lost behind one default.
How is the "right model" determined?
A small decision tree trained on the scikit-learn algorithm cheat-sheet plus practitioner heuristics, encoded as five branching questions. The output isn't algorithmic ground truth — it's a starting point and an explanation of why. Each recommendation includes its tradeoffs (for example, "favors interpretability over flexibility") and the algorithms you'd consider next if the recommendation doesn't fit.
Are the demos real or simulated?
Real. Each algorithm's deep-dive page implements that algorithm in vanilla JavaScript, runs it on synthetic or canonical datasets, and visualizes the training and inference loop live. No hidden API calls, no cached predictions. You can see gradient descent slide a logistic regression boundary into place, watch DBSCAN merge density clusters in real time, see decision trees split on feature thresholds, and so on.