Density-based clustering locates regions of high density that are separated from one another by regions of low density. DBSCAN is a simple and effective density-based clustering algorithm that illustrates a number of important concepts that are important for any density-based clustering approach.
Traditional Density: Center-Based Approach
Although there are not as many approaches for defining density as there are for defining similarity, there are several distinct methods. In the center-based approach, density is estimated for a particular point in the data set by counting the number of points within a specified radius, $Eps$, of that point. This includes the point itself. The following diagram illustrates this technique.
Classification of points according to center based density
- Core points: These points are in the interior of a density-based cluster. A point is a core point if the number of points within a given neighborhood around the point as determined by the distance function and a user-specified distance parameter, $Eps$, exceeds a certain threshold, $MinPts$, which is also a user-specified parameter.
- Border points: A border point is not a core point, but falls within the neighborhood of a core point.
- Noise points: A noise point is any point that is neither a core point nor a border point.
The following diagram illustrates all types of points
Given the previous definitions of core points, border points, and noise points, the DBSCAN algorithm can be informally described as follows. Any two core points that are close enough—within a distance Eps of one another—are put in the same cluster. Likewise, any border point that is close enough to a core point is put in the same cluster as the core point. (Ties may need to be resolved if a border point is close to core points from different clusters.) Noise points are discarded.
Time and space complexity
$$time = O(m \times time\ to\ find\ points\ in\ the\ Eps-neighborhood)$$ $$Space = O(m^2)$$