Model Overview

This page orients you across the model families that show up most in interviews and in practice. Use it as a quick decision guide, then jump into the focused pages for details and pitfalls.

What to know cold:

  • Bias–variance trade‑off and how it manifests across families
  • Regularization knobs (L1/L2, early stopping, depth, learning rate)
  • Data/feature requirements (scaling, sparsity, linear separability)
  • Calibration and thresholding vs. raw scores

Common families and when to reach for them:

  • Linear/Logistic Regression: strong baselines; fast, interpretable; requires feature scaling and linear-ish relationships.
  • Tree‑based Methods (DT/RF/GBM): minimal preprocessing; robust to nonlinearity and mixed feature types; watch depth/learning‑rate.
  • Naive Bayes: very fast text baseline; independence assumption often “good enough” with sparse features.
  • SVM: strong on medium‑sized tabular datasets; requires scaling; kernel choices matter.
  • PCA: unsupervised dimensionality reduction; apply only on train folds to avoid leakage.
  • k‑Means: quick clustering; prefers spherical clusters; standardize features first.
  • Ensembles: bagging/boosting/stacking to trade variance vs. bias and push performance.

Next steps:

  • See the dedicated pages for tuning, diagnostics, and interview prompts:
    • /docs/machine_learning/model/linear-and-logistic-regression/
    • /docs/machine_learning/model/tree-based-methods/
    • /docs/machine_learning/model/bayes-theorem-and-naive-bayes/
    • /docs/machine_learning/model/support-vector-machines/
    • /docs/machine_learning/model/principal-component-analysis-pca/
    • /docs/machine_learning/model/k-means-clustering/
    • /docs/machine_learning/model/ensemble-learning/