Lesson 8.4: Dimensionality Reduction – PCA
🔹 What is PCA?
PCA (Principal Component Analysis) is an unsupervised technique used to reduce the number of features (dimensions) in a dataset while retaining most of the variance (information).
-
Helps simplify data, improve visualization, and speed up algorithms.
🔹 Key Concepts
-
Principal Components (PCs): New set of axes capturing maximum variance.
-
Variance: Amount of information each component retains.
-
Eigenvectors & Eigenvalues: Mathematical foundation of PCA to determine directions of maximum variance.
🔹 How it Works
-
Standardize the dataset.
-
Compute covariance matrix.
-
Calculate eigenvectors and eigenvalues.
-
Sort eigenvectors by eigenvalues → Select top N components.
-
Transform data onto these principal components.
🔹 Example
-
n_components→ Number of dimensions to keep -
X_reduced→ Transformed dataset in lower dimensions
🔹 Advantages
-
Reduces computational cost.
-
Removes redundant features.
-
Improves visualization for high-dimensional data.
🔹 Disadvantages
-
Can lose interpretability (principal components are combinations of features).
-
Only captures linear relationships.
✅ Quick Recap:
-
PCA → Reduces dimensions while retaining most variance.
-
Useful for simplification, visualization, and faster computation.
