K-Means Clustering
Working Principle
K-Means Clustering is a simple but powerful unsupervised learning algorithm. K-Means Clustering aims to divide a given sample data into "k" groups. The process is intuitively simple:
- Place k number centroids of the clusters in random locations in the data space.
- For each sample in the data space, associate its class with the nearest centroid.
- For each cluster, compute the geometric center and move the centroids there.
- Repeat steps 2 and 3 until you don't need to change the location of the centroid.
Automatically compute "k"
There is a way to find the number of clusters(k) you need. This technique utilizes the elbow curve that is generated by creating a line graph of the number of clusters "k" and the corresponding reduction in variation. This is not currently implemented in MAGIST, but there are plans to implement it soon.
Implementation Using MAGIST
Currently, MAGIST uses K-Means Clustering for masking images. It attempts to identify individual objects in an image which can then be further processed to extract more information. For this, you must use the UnsupervisedModels
under the Vision
directory. It can be imported like so:
- UnsupervisedModels contains a Python file named
img_cluster.py
that contains theRoughCluster
class. This is where all the necessary methods for computing the clusters.
Note
This computation is done with the help of SciKit-Learn and Pandas.
Now, create an instance of the RoughCluster
class:
- More information about the config file is located here: Setup Config File
Finally, we can compute the cluster:
n_of_clusters = 3
img_location = "Data/test.jpg"
img_size = (200, 200)
masked_img_dir = "Data/Clusters"
masked_img_locations = clusterer.unsupervised_clusters(n_of_clusters, img_location, img_size, masked_img_dir)
Here is what masked_img_locations
should look like:
- The amount of images listed depends on the
n_of_clusters
.
These images are the final masked results. Through the config file, you can enable verbose to see the intermediate stage before it individually exporting the images.