๐ Feature Scaling in Machine Learning
Feature scaling ensures that numerical features are on a similar scale, which helps models train faster and more accurately.
๐จ Some models (like KNN, SVM, neural networks) can be seriously affected by unscaled data!
๐ Why Scale Features?
- Prevent features with large values from dominating the learning process.
- Improve convergence in gradient-based methods (like neural networks).
- Ensure distance-based algorithms (e.g., K-Means, KNN) behave properly.
โ๏ธ Normalization vs Standardization
๐น 1. Normalization (Min-Max Scaling)
๐ Formula:
xโฒ=xโxminxmaxโxminx' = \frac{x - x_{\text{min}}}{x_{\text{max}} - x_{\text{min}}}
- Rescales values to a [0, 1] range
- Sensitive to outliers
- Also called Min-Max Scaling
โ Best For:
- When you know the bounds of your features
- Algorithms like KNN, Neural Networks
๐ธ 2. Standardization (Z-score Scaling)
๐ Formula:
xโฒ=xโฮผฯx' = \frac{x - \mu}{\sigma}
- Rescales data to have mean = 0 and standard deviation = 1
- Not bounded โ values can be negative or greater than 1
- More robust to outliers (compared to normalization)
โ Best For:
- When data is normally distributed
- Algorithms like SVM, Logistic Regression, Linear Regression
๐ Quick Comparison Table:
Feature | Normalization | Standardization |
---|---|---|
Output Range | [0, 1] | Mean = 0, Std Dev = 1 |
Sensitive to Outliers | Yes | Less sensitive |
Use Case | Neural nets, KNN, deep learning | Linear models, SVM, PCA |
Also Called | Min-Max Scaling | Z-score Scaling |
๐ก Bonus Tip:
Use sklearn.preprocessing:
from sklearn.preprocessing import MinMaxScaler, StandardScaler # Normalization minmax = MinMaxScaler() X_scaled = minmax.fit_transform(X) # Standardization standard = StandardScaler() X_scaled = standard.fit_transform(X)
๐ง TL;DR:
- Normalize if you need bounded values (0 to 1)
- Standardize if your data looks Gaussian or has outliers
- Always scale your training and test data using the same parameters!
Want this turned into a quick-reference visual, a code walk-through, or even a carousel for social media? Just let me know!