Lesson 8.2: Hierarchical Clustering
🔹 What is Hierarchical Clustering?
Hierarchical Clustering is an unsupervised learning algorithm used to group data into a tree-like structure called a dendrogram.
-
Does not require specifying the number of clusters initially.
-
Two types:
-
Agglomerative (Bottom-Up) – Start with each point as a cluster and merge.
-
Divisive (Top-Down) – Start with all points in one cluster and split.
-
🔹 How it Works (Agglomerative)
-
Treat each data point as a single cluster.
-
Calculate distance between clusters (Euclidean, Manhattan).
-
Merge the closest clusters.
-
Repeat until all points form a single cluster or stopping criterion is met.
🔹 Example
-
linkage→ Computes hierarchical clustering -
dendrogram→ Visualizes cluster hierarchy
🔹 Advantages
-
No need to predefine the number of clusters.
-
Provides visual representation of cluster relationships.
🔹 Disadvantages
-
Computationally expensive for large datasets.
-
Sensitive to noise and outliers.
✅ Quick Recap:
-
Hierarchical Clustering → Builds a tree of clusters (dendrogram).
-
Agglomerative = merge clusters, Divisive = split clusters.
