Lesson 3.6: Feature Scaling – Normalization, Standardization

Data Science and Machine Learning Basics

Lesson 3.6: Feature Scaling – Normalization, Standardization

Feature scaling is an essential step in data preprocessing. It ensures that all features contribute equally to the model by bringing them to a similar scale. Without scaling, features with larger numerical ranges may dominate the learning process, leading to biased results.

1. Why Feature Scaling is Needed?

Machine learning algorithms like KNN, SVM, Logistic Regression, Neural Networks are sensitive to the scale of data.
For example, if one feature is in kilometers (0–1000) and another in meters (0–1), the algorithm may give more importance to the larger range feature.
Scaling solves this issue by adjusting features to the same scale.

2. Types of Feature Scaling

There are two commonly used techniques:

a) Normalization (Min-Max Scaling)

Rescales features into a range of 0 to 1 (or -1 to 1 in some cases).
Formula:

$Xnorm=X−XminXmax−XminX_{norm} = \frac{X – X_{min}}{X_{max} – X_{min}}$
Best when the distribution is not Gaussian and data needs bounded values.
Example: Image pixel values are often normalized between 0 and 1.

b) Standardization (Z-score Normalization)

Transforms data to have mean = 0 and standard deviation = 1.
Formula:

$Xstd=X−μσX_{std} = \frac{X – \mu}{\sigma}$

where μ = mean, σ = standard deviation.
Useful when data follows a Gaussian distribution.
Works well with algorithms that assume normal distribution (e.g., Linear Regression, Logistic Regression).

3. Example in Python

4. When to Use?

Normalization → If features have different scales and you need them between 0–1 (e.g., neural networks, distance-based algorithms like KNN).
Standardization → If data is normally distributed or when algorithms assume Gaussian distribution (e.g., regression models, PCA).

✅ In short:

Normalization → scales to [0,1]
Standardization → mean = 0, std = 1

Both ensure fair treatment of features in machine learning models.