Skip to Content

Overfitting and Underfitting


🎯 Overfitting vs Underfitting in Machine Learning

Understanding these two common problems is key to building accurate, generalizable models.

1. πŸ“ˆ Overfitting

πŸ” What is it?

When a model learns too much from the training data, including noise and outliers, and fails to generalize to new, unseen data.

⚠️ Signs of Overfitting:

  • Very high accuracy on training data
  • Poor performance on validation/test data

πŸ“Š Example:

Imagine fitting a complex curve to a small scatterplot β€” it matches every point perfectly, but it’s useless for future predictions.

πŸ› οΈ How to Fix:

  • Use simpler models
  • Apply regularization (L1/L2)
  • Prune decision trees
  • Use more training data
  • Apply dropout (in neural networks)
  • Use cross-validation

2. πŸ“‰ Underfitting

πŸ” What is it?

When a model is too simple to learn the underlying patterns in the data β€” it performs poorly on both training and test sets.

⚠️ Signs of Underfitting:

  • Low accuracy on both training and validation data
  • Model doesn’t improve even with more data

πŸ“Š Example:

Fitting a straight line to data that clearly follows a curve β€” the model can’t capture the pattern.

πŸ› οΈ How to Fix:

  • Use a more complex model
  • Add more features (feature engineering)
  • Reduce regularization
  • Train for longer (more epochs in neural networks)

🧠 Quick Comparison Table:

Overfitting Underfitting
Model Complexity Too complex Too simple
Training Accuracy High Low
Test Accuracy Low Low
Generalization Poor Poor
Solution Simplify model, regularization Increase complexity, better features

πŸ“Œ Pro Tip: Use the Bias-Variance Tradeoff to find the sweet spot!

  • High bias = underfitting
  • High variance = overfitting
  • Ideal models strike a balance.

Let me know if you'd like a visual or infographic version, or even a video script to explain this in an engaging way!