Lesson 3.6: Feature Scaling – Normalization, Standardization
Feature scaling is an essential step in data preprocessing. It ensures that all features contribute equally to the model by bringing them to a similar scale. Without scaling, features with larger numerical ranges may dominate the learning process, leading to biased results.
1. Why Feature Scaling is Needed?
-
Machine learning algorithms like KNN, SVM, Logistic Regression, Neural Networks are sensitive to the scale of data.
-
For example, if one feature is in kilometers (0–1000) and another in meters (0–1), the algorithm may give more importance to the larger range feature.
-
Scaling solves this issue by adjusting features to the same scale.
2. Types of Feature Scaling
There are two commonly used techniques:
a) Normalization (Min-Max Scaling)
-
Rescales features into a range of 0 to 1 (or -1 to 1 in some cases).
-
Formula:
Xnorm=X−XminXmax−XminX_{norm} = \frac{X – X_{min}}{X_{max} – X_{min}}
-
Best when the distribution is not Gaussian and data needs bounded values.
-
Example: Image pixel values are often normalized between 0 and 1.
b) Standardization (Z-score Normalization)
-
Transforms data to have mean = 0 and standard deviation = 1.
-
Formula:
Xstd=X−μσX_{std} = \frac{X – \mu}{\sigma}
where μ = mean, σ = standard deviation.
-
Useful when data follows a Gaussian distribution.
-
Works well with algorithms that assume normal distribution (e.g., Linear Regression, Logistic Regression).
3. Example in Python
4. When to Use?
-
Normalization → If features have different scales and you need them between 0–1 (e.g., neural networks, distance-based algorithms like KNN).
-
Standardization → If data is normally distributed or when algorithms assume Gaussian distribution (e.g., regression models, PCA).
✅ In short:
-
Normalization → scales to [0,1]
-
Standardization → mean = 0, std = 1
Both ensure fair treatment of features in machine learning models.
