Skip to Content

Feature Scaling (Normalization & Standardization)


๐Ÿ“ Feature Scaling in Machine Learning

Feature scaling ensures that numerical features are on a similar scale, which helps models train faster and more accurately.

๐Ÿšจ Some models (like KNN, SVM, neural networks) can be seriously affected by unscaled data!

๐Ÿ” Why Scale Features?

  • Prevent features with large values from dominating the learning process.
  • Improve convergence in gradient-based methods (like neural networks).
  • Ensure distance-based algorithms (e.g., K-Means, KNN) behave properly.

โš–๏ธ Normalization vs Standardization

๐Ÿ”น 1. Normalization (Min-Max Scaling)

๐Ÿ“Œ Formula:

xโ€ฒ=xโˆ’xminxmaxโˆ’xminx' = \frac{x - x_{\text{min}}}{x_{\text{max}} - x_{\text{min}}}

  • Rescales values to a [0, 1] range
  • Sensitive to outliers
  • Also called Min-Max Scaling

โœ… Best For:

  • When you know the bounds of your features
  • Algorithms like KNN, Neural Networks

๐Ÿ”ธ 2. Standardization (Z-score Scaling)

๐Ÿ“Œ Formula:

xโ€ฒ=xโˆ’ฮผฯƒx' = \frac{x - \mu}{\sigma}

  • Rescales data to have mean = 0 and standard deviation = 1
  • Not bounded โ€” values can be negative or greater than 1
  • More robust to outliers (compared to normalization)

โœ… Best For:

  • When data is normally distributed
  • Algorithms like SVM, Logistic Regression, Linear Regression

๐Ÿ“Š Quick Comparison Table:

Feature Normalization Standardization
Output Range [0, 1] Mean = 0, Std Dev = 1
Sensitive to Outliers Yes Less sensitive
Use Case Neural nets, KNN, deep learning Linear models, SVM, PCA
Also Called Min-Max Scaling Z-score Scaling

๐Ÿ’ก Bonus Tip:

Use sklearn.preprocessing:

from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Normalization
minmax = MinMaxScaler()
X_scaled = minmax.fit_transform(X)

# Standardization
standard = StandardScaler()
X_scaled = standard.fit_transform(X)

๐Ÿง  TL;DR:

  • Normalize if you need bounded values (0 to 1)
  • Standardize if your data looks Gaussian or has outliers
  • Always scale your training and test data using the same parameters!

Want this turned into a quick-reference visual, a code walk-through, or even a carousel for social media? Just let me know!