FIG. 01 The Churn Model, In the Open
The customers
about to leave.
A churn model, built and run in public. Finds the customers who are about to cancel so the company can save them. Every saved account is revenue already paid to acquire.
Every chart on this page is computed live in your browser from the same JSON files the Python script writes to /assets/data/churn/. No hosted inference, no screenshots, no hand-picked numbers.
§ I Why a churn model matters
Keeping a customer is cheaper than winning one back. About 1 in 5 bank customers will quietly leave in a given cycle. At roughly $500 of lost lifetime value per departing customer, that's $100,000 of revenue walking out the door for every 1,000 customers on the books — before anyone has a chance to intervene.
A churn model fixes that by scoring every customer on how likely they are to leave next. Retention teams then spend a small outreach budget on the customers most at risk — rather than blanketing everyone, or (more commonly) finding out too late. Tuned against real dollars (see § VI), the model on this page cuts the expected loss from about $100,000 per 1,000 customers down to $28,000 — a 72% reduction. At a 10,000-customer book that's roughly $740,000 saved every decision cycle, with no other change to the business.
Everything below is computed live in your browser from the linked Python script. The point here isn't absolute accuracy — it's the shape of a responsible workflow: compare several models, explain the winner, audit it for disparate impact, and tune the threshold against the dollars that actually matter.
§ II Data Profile
§ III Three models, one test set
Class imbalance is handled on every model (class_weight='balanced' for LR and RF, scale_pos_weight for XGB). Winner per column highlighted.
| Model | Accuracy | Precision | Recall | F1 | ROC AUC | PR AUC |
|---|
§ IV · FIG. 01.1 ROC curves — all three models overlaid
Fig. 01.1 — Higher = better. Diagonal would be a coin flip. Each curve traces the trade-off between catching churners (TPR) and wrongly flagging loyal customers (FPR) as the decision threshold sweeps from 1 to 0.
§ V · FIG. 01.2 Score a customer, live
Inputs are standardized with the same means and standard deviations the training scaler fit. The logit, probability, and feature contributions are all computed in your browser with the model's actual coefficients.
Live scoring uses the logistic regression model for interpretability. XGBoost performs better on aggregate metrics (see § III) but its prediction is a sum over 300 trees — not something a slider explains.
§ VI · FIG. 01.3 Threshold tuning against dollars
Real models ship with a decision threshold, not just a score. Move the slider to see how precision, recall, and per-thousand-customer cost shift. The accent dot marks the cost-minimizing threshold.
§ VII · FIG. 01.4 Feature importance — mean |SHAP|, XGBoost
Tree SHAP decomposes each XGBoost prediction into per-feature contributions and averages their absolute values across a 500-row test sample. Read it as: which features move the score the most, regardless of direction.
§ VIII · FIG. 01.5 SHAP dependence — top two features
X-axis: the raw feature value for one customer. Y-axis: how much that feature moved their churn score, positive or negative. Non-linearity shows up as curves — something a linear model can't see.
§ IX · FIG. 01.6 Fairness audit — by Geography & Gender
Disparate impact is a live concern in regulated banking — the OCC and FDIC examine retention models for fair-lending proxies. This panel shows the same model sliced by protected-attribute-adjacent fields. Even on a public dataset, gaps show up.
§ X Methodology & Colophon
Kaggle · shrutimechlearn/churn-modelling ↗ — 10,000 synthetic bank customers across France, Germany, Spain.
notebooks/churn_model.py ↗ — drop IDs, label-encode Gender, one-hot Geography, stratified 80/20 split, StandardScaler on numeric features, three models, SHAP on XGB, fairness slice on the best model, threshold sweep 0.05 → 0.95.
random_state=42 everywhere. Last regenerated —. Clone the repo, run python notebooks/churn_model.py, regenerate all nine JSON files.
Synthetic Kaggle dataset — small by production standards. Live scoring uses logistic regression for interpretability, which misses non-linear interactions the tree models catch (see § III for the gap). No retraining schedule or drift detection — both essential in anything that actually runs.