Model Overview

This page orients you across the model families that show up most in interviews and in practice. Use it as a quick decision guide, then jump into the focused pages for details and pitfalls.

What to know cold:

Bias–variance trade‑off and how it manifests across families
Regularization knobs (L1/L2, early stopping, depth, learning rate)
Data/feature requirements (scaling, sparsity, linear separability)
Calibration and thresholding vs. raw scores

Common families and when to reach for them:

Linear/Logistic Regression: strong baselines; fast, interpretable; requires feature scaling and linear-ish relationships.
Tree‑based Methods (DT/RF/GBM): minimal preprocessing; robust to nonlinearity and mixed feature types; watch depth/learning‑rate.
Naive Bayes: very fast text baseline; independence assumption often “good enough” with sparse features.
SVM: strong on medium‑sized tabular datasets; requires scaling; kernel choices matter.
PCA: unsupervised dimensionality reduction; apply only on train folds to avoid leakage.
k‑Means: quick clustering; prefers spherical clusters; standardize features first.
Ensembles: bagging/boosting/stacking to trade variance vs. bias and push performance.

Next steps:

See the dedicated pages for tuning, diagnostics, and interview prompts:
- /docs/machine_learning/model/linear-and-logistic-regression/
- /docs/machine_learning/model/tree-based-methods/
- /docs/machine_learning/model/bayes-theorem-and-naive-bayes/
- /docs/machine_learning/model/support-vector-machines/
- /docs/machine_learning/model/principal-component-analysis-pca/
- /docs/machine_learning/model/k-means-clustering/
- /docs/machine_learning/model/ensemble-learning/

Data

Linear and Logistic Regression

Docs

Manqing's Website

Title here

Model Overview