Lesson 8.3: DBSCAN Clustering
🔹 What is DBSCAN?
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is an unsupervised clustering algorithm that groups densely packed points and marks sparse points as outliers.
-
Does not require specifying the number of clusters.
-
Good for clusters of arbitrary shape.
🔹 Key Concepts
-
ε (epsilon): Maximum distance to consider points as neighbors.
-
MinPts: Minimum number of points required to form a dense region.
-
Core Points: Points with ≥ MinPts neighbors within ε.
-
Border Points: Points within ε of a core point but < MinPts neighbors.
-
Noise Points: Points not belonging to any cluster.
🔹 How it Works
-
For each point, find neighbors within ε distance.
-
If neighbors ≥ MinPts → Core point → forms a cluster.
-
Expand cluster by adding reachable neighbors.
-
Points not reachable from any core → Marked as noise.
🔹 Example
-
eps→ Radius to search neighbors -
min_samples→ Minimum points to form a cluster -
labels→ Cluster assignments (-1 indicates noise)
🔹 Advantages
-
Can detect clusters of arbitrary shape.
-
Automatically identifies outliers.
🔹 Disadvantages
-
Sensitive to eps and MinPts parameters.
-
Struggles with varying density clusters.
✅ Quick Recap:
-
DBSCAN → Density-based clustering → groups dense points, identifies outliers.
-
Works well for non-spherical clusters and noisy data.
