The First Decision: What Type of Problem Is This?

Before selecting an algorithm, identify the problem type. This sounds obvious, but it's the most common source of ML project misdirection. "Predict customer churn" could be classification (will they churn? yes/no), regression (how many days until churn?), or survival analysis (what's the probability of churn in each future time period?). Each formulation leads to different algorithms, different evaluation metrics, and different integration patterns. The right formulation comes from the business question, not from the data.

Problem TypeOutputBusiness QuestionsEvaluation Metric
ClassificationCategory (yes/no, A/B/C)Will this customer churn? Is this transaction fraudulent? Which segment?AUC, F1, Precision, Recall
RegressionContinuous numberHow much revenue? What price? How many units?RMSE, MAE, MAPE, R²
ClusteringGroup assignmentWhat segments exist? Which items are similar? What's anomalous?Silhouette, business validation
NLPText understanding/generationWhat's the sentiment? What's the intent? Summarize this.Task-specific (accuracy, BLEU, ROUGE)
Computer VisionImage understandingWhat's in this image? Is this defective? Where is the damage?mAP, IoU, accuracy
Time SeriesFuture value(s)What will demand be? What's the forecast? When will it fail?MAPE, RMSE, coverage
Formulation Drives Everything

Spend time getting the problem formulation right before touching algorithms. A misformulated problem — classification when the business needs regression, or point prediction when the business needs a probability distribution — wastes the entire model development cycle. The right formulation comes from three questions: What does the business need to know? What decision does it inform? What form does the answer need to take?

Classification: When the Answer Is a Category

Classification predicts which category an entity belongs to — churn/not-churn, fraud/legitimate, high-risk/medium-risk/low-risk. The output is a probability per class; the decision threshold converts probability to action.

Algorithm Selection for Classification

AlgorithmBest ForStrengthsLimitations
Logistic RegressionBaseline, interpretable models, linear relationshipsFast training, fully interpretable, well-calibrated probabilitiesCan't capture non-linear relationships without feature engineering
Random ForestModerate datasets, minimal tuning neededHandles non-linearity, resistant to overfitting, feature importanceSlower inference than linear, less accurate than boosting
XGBoost / LightGBMMost tabular classification problemsBest accuracy on tabular data, handles missing values, fast trainingRequires tuning, less interpretable than linear
Neural NetworksHigh-dimensional data, complex interactionsLearns complex patterns, handles unstructured data (text, image)Needs large data, expensive training, less interpretable

The default for enterprise tabular classification: XGBoost or LightGBM. Start here. Move to logistic regression if interpretability is critical (regulatory requirement, explainable decisions). Move to neural networks only if tabular models plateau and you have >100K training examples with complex interaction patterns.

Threshold Selection

The model outputs probabilities. The threshold converts probability to decision — above 0.5, predict churn; below, predict retain. But 0.5 is rarely the right threshold. The optimal threshold depends on the cost of errors: if a false negative (missed churn) costs $5,000 and a false positive (unnecessary intervention) costs $200, the threshold should be much lower than 0.5 — catching more churners at the cost of some unnecessary interventions. The threshold is a business decision, not a statistical one.

Regression: When the Answer Is a Number

Regression predicts a continuous value — revenue forecast, customer lifetime value, estimated claim cost, property valuation, time to completion.

Algorithm Selection for Regression

AlgorithmBest ForKey Characteristic
Linear / Ridge / LassoBaseline, interpretable, linear relationshipsCoefficients are directly interpretable as "for each unit increase in X, Y changes by..."
XGBoost / LightGBM RegressorMost tabular regressionBest accuracy on structured data, handles non-linearity automatically
Elastic NetMany correlated featuresCombines L1 and L2 regularization, performs feature selection
Quantile RegressionPrediction intervals neededPredicts percentiles (10th, 50th, 90th) instead of mean — produces prediction ranges

For regression, always report prediction intervals, not just point estimates. "Revenue forecast: $12.4M" is less useful than "Revenue forecast: $12.4M (80% confidence: $11.2M-$13.6M)." Decision-makers need to understand the uncertainty around predictions to make appropriate decisions. Quantile regression or conformal prediction provide these intervals.

Clustering: When There Is No Right Answer

Clustering finds natural groupings in data without a target variable — customer segmentation, document grouping, anomaly detection, market basket analysis. Unlike classification and regression (supervised — the model learns from labeled examples), clustering is unsupervised — the model discovers structure the analyst didn't define in advance.

Algorithm Selection for Clustering

AlgorithmBest ForRequires # Clusters?Handles Non-Spherical?
K-MeansLarge datasets, spherical clustersYes (k must be specified)No
DBSCANArbitrary shapes, noise detectionNo (density-based)Yes
HierarchicalSmall-medium datasets, explore cluster structureNo (cut dendrogram at desired level)Yes
Gaussian MixtureSoft assignments, probabilistic clusteringYes (k components)Partially (ellipsoidal)

The clustering trap: silhouette score isn't enough. Clustering quality must be validated by the business — do the segments make sense? Are they actionable? Can the marketing team design different strategies for each segment? A clustering with perfect silhouette score but no business interpretation is mathematically valid and practically useless.

NLP: When the Input Is Text

NLP tasks in enterprise ML: text classification (email → complaint/inquiry/request), named entity recognition (contract → extract parties, dates, amounts), sentiment analysis (review → positive/negative/neutral), summarization (document → key points), and question answering (knowledge base → answer specific questions).

Model Selection for NLP

Traditional ML (TF-IDF + classification): For simple text classification with labeled training data. Fast, interpretable, sufficient for many enterprise use cases (email routing, ticket categorization). Works with 1,000-10,000 labeled examples.

Pre-trained transformers (BERT, RoBERTa) with fine-tuning: For nuanced text understanding — sentiment that depends on context, intent that requires world knowledge, classification that requires understanding relationships between words. Fine-tune on 500-5,000 domain-specific examples. Significantly more accurate than TF-IDF for complex text tasks.

Large Language Models (GPT, Claude) with prompting: For tasks where labeled training data is scarce or unavailable. Zero-shot and few-shot prompting can classify, extract, and summarize without fine-tuning. Best for: prototyping, low-volume tasks, and tasks that benefit from general world knowledge. Limitation: higher inference cost, latency, and the need for prompt engineering.

Computer Vision: When the Input Is an Image

Enterprise computer vision: quality inspection (detect defects on manufacturing line), document processing (extract data from scanned documents), damage assessment (evaluate insurance claims from photos), security (facial recognition, access control), and medical imaging (detect anomalies in X-rays, CT scans).

Pre-trained CNNs with transfer learning (ResNet, EfficientNet, ViT) are the standard approach. Train a pre-trained model on 500-5,000 labeled images from your domain. For most enterprise computer vision, transfer learning achieves production-grade accuracy — training from scratch requires 100,000+ images and is rarely necessary.

Time Series: When the Pattern Is Temporal

Time series forecasting predicts future values based on historical patterns — demand forecasting, revenue projection, capacity planning, equipment degradation. Time series problems have unique challenges: seasonality (repeating patterns at fixed intervals), trend (long-term direction), and autocorrelation (each value depends on previous values).

Algorithm Selection for Time Series

AlgorithmBest ForHandles Seasonality?Handles External Variables?
ARIMA / SARIMAUnivariate, single seriesSARIMA: yesARIMAX: yes
ProphetBusiness time series with holidays, changepointsYes (multiple)Yes (regressors)
XGBoost (lagged features)Many series, cross-series featuresVia featuresYes (any feature)
LSTM / Temporal FusionComplex dependencies, long horizonsLearnedYes

For enterprise demand forecasting across many SKUs/locations: XGBoost with engineered temporal features (lags, moving averages, calendar features) typically outperforms deep learning — faster to train, easier to explain, and more performant at the SKU level where each series has limited history. Reserve LSTM and transformer-based models for problems with long-range dependencies and large training sets.

The Complexity Ladder: Start Simple, Add Complexity Only When Needed

1

Baseline: Simple Rules or Heuristics

Before any ML: what does the simplest possible approach achieve? Average of last 3 months for forecasting. "All customers churn" as a naive baseline. Rules-based scoring for fraud. The ML model must beat this baseline — otherwise, the investment isn't justified.

2

Linear Models

Logistic regression, linear regression, elastic net. Fast to train, fully interpretable, sufficient for many enterprise use cases. If linear models achieve the accuracy threshold, stop here — the interpretability and operational simplicity are worth more than marginal accuracy improvements.

3

Gradient Boosting

XGBoost, LightGBM, CatBoost. The sweet spot for most tabular enterprise ML — high accuracy, handles non-linearity, reasonable training time, feature importance for interpretability. This is where 80% of enterprise tabular ML should land.

4

Deep Learning

Neural networks, transformers, CNNs. Use only for: unstructured data (text, images, audio), very large datasets (100K+ examples), or tabular problems where gradient boosting plateaus and the accuracy gap is worth the infrastructure investment. Deep learning costs 5-10x more to train and serve than gradient boosting — justify the cost with measured accuracy improvement.

The best model is the simplest model that meets the accuracy threshold. Every additional layer of complexity adds training cost, serving cost, debugging difficulty, and operational risk — justify each layer with measured improvement. — Xylity ML Engineering Practice

The Selection Decision Matrix

Use this matrix to select the starting algorithm for each enterprise ML use case. Start at the recommended algorithm. Escalate to higher complexity only if the starting algorithm doesn't meet the accuracy threshold after thorough feature engineering.

Use CaseProblem TypeStart WithEscalate ToData Requirement
Customer churnBinary classificationXGBoost + RFM featuresNeural net if >500K customers12+ months behavioral data
Fraud detectionAnomaly + classificationXGBoost + isolation forestGraph neural net for network fraudLabeled fraud cases + transaction history
Demand forecastTime series regressionXGBoost + temporal featuresTemporal fusion transformer3+ years at required granularity
CLV predictionRegressionXGBoost + RFM + tenureProbabilistic (BG/NBD + Gamma-Gamma)2+ years transaction history
Customer segmentsClusteringK-Means on RFMGaussian mixture + behavioral featuresSufficient volume per segment (100+ per cluster)
Text classificationMulti-class classificationTF-IDF + logistic regressionFine-tuned BERT1,000+ labeled examples per class
Quality inspectionImage classificationTransfer learning (ResNet)Custom CNN + object detection500+ labeled images per defect type
Predictive maintenanceTime-to-event / classificationXGBoost + sensor featuresLSTM if long temporal dependenciesLabeled failure events + sensor history

The Xylity Approach

We select models through the complexity ladder — start simple, measure, escalate only when the accuracy gap justifies the infrastructure cost. Our data scientists develop models; our ML engineers deploy them to production. The output: the right model at the right complexity level, deployed and monitored for sustained accuracy. Machine learning consulting that produces production systems, not research papers.

Continue building your understanding with these related resources from our consulting practice.

The Right Model for the Right Problem

Classification, regression, clustering, NLP, computer vision, time series — the decision framework that matches algorithm complexity to business value.

Start Your ML Model Selection Engagement →