R packages for CLARA clustering

An obvious way of clustering larger datasets is to try and extend existing methods so that they can cope with a larger number of objects. The focus is on clustering large numbers of objects rather than a small number of objects in high dimensions.

The CLARA (Clustering for Large Applications) algorithm is fully described in chapter 3 of Kaufman and Rousseeuw (1990). Compared to other partitioning methods such as PAM (Partitioning Around Medoids), it can deal with much larger datasets. Internally, this is achieved by considering sub-datasets of fixed size (sampsize) such that the time and storage requirements become linear in n rather than quadratic.
Each sub-dataset is partitioned into k clusters using the same algorithm as in pam.
Once k representative objects have been selected from the sub-dataset, each observation of the entire dataset is assigned to the nearest medoid.

R package ‘cluster’ (https://cran.r-project.org/web/packages/cluster/cluster.pdf)
computes a “clara” object, a list representing a clustering of the data into k clusters.
The currently available distance metrics for calculating dissimilarities between observations are “euclidean” and “manhattan”.
Euclidean distances are root sum-of-squares of differences, and manhattan distances are the sum of absolute differences.

R package ‘ClusterR’ (https://cran.r-project.org/web/packages/ClusterR/ClusterR.pdf)
has the Clara_Medoids function with support for different distance metrics for calculating dissimilarities between observations such as “euclidean”, “manhattan”, “chebyshev”,
“canberra”, “braycurtis”, “pearson_correlation”, “simple_matching_coefficient”,
“minkowski”, “hamming”, “jaccard_coefficient”, “Rao_coefficient” and “mahalanobis”.
Also, it has a threads argument specifying the number of cores to run in parallel. Openmp will be utilized to parallelize the number of the different sample draws.

Oglasi
Ovaj unos je objavljen u Nekategorizirano. Bookmarkirajte stalnu vezu.

Komentiraj

Popunite niže tražene podatke ili kliknite na neku od ikona za prijavu:

WordPress.com Logo

Ovaj komentar pišete koristeći vaš WordPress.com račun. Odjava /  Izmijeni )

Google+ photo

Ovaj komentar pišete koristeći vaš Google+ račun. Odjava /  Izmijeni )

Twitter picture

Ovaj komentar pišete koristeći vaš Twitter račun. Odjava /  Izmijeni )

Facebook slika

Ovaj komentar pišete koristeći vaš Facebook račun. Odjava /  Izmijeni )

Spajanje na %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.