Lesson 8.1: K-Means Clustering
🔹 What is K-Means Clustering?
K-Means is an unsupervised learning algorithm used to group similar data points into clusters.
-
“K” = Number of clusters to create.
-
Goal → Minimize the distance between points and cluster centroids.
🔹 How it Works
-
Choose K (number of clusters).
-
Initialize K centroids randomly.
-
Assign each data point to the nearest centroid.
-
Recalculate centroids as the mean of assigned points.
-
Repeat steps 3–4 until centroids stabilize (no significant change).
🔹 Example
-
X→ Input data -
labels→ Cluster assigned to each point -
centroids→ Coordinates of cluster centers
🔹 Advantages
-
Simple and fast for large datasets.
-
Works well when clusters are spherical and well-separated.
🔹 Disadvantages
-
Need to specify K in advance.
-
Sensitive to outliers and initial centroid placement.
✅ Quick Recap:
-
K-Means → Partition data into K clusters by minimizing distance to centroids.
-
Iteratively assigns points and recalculates centroids until stable.
