Partitioning (Clustering) Around Medoids

Partitioning (clustering) of the data into k clusters “around medoids” is a more robust version of K-means. Compared to the k-means, the function pam has the following features: (a) it also accepts a dissimilarity matrix; (b) it is more robust because it minimizes a sum of dissimilarities instead of a sum of squared euclidean distances; (c) it provides a novel graphical display, the silhouette plot (see plot.partition) (d) it allows to select the number of clusters using mean(silhouette(pr)) on the result pr <- pam(..).

For example, the resulting clusters can help in further analysis of response times obtained from log files (see example from my blog Anonymous Named Pipes).

R code:

library(xlsx)
data1=read.xlsx(‘C:/Users/emiltom/Google disk/OSBData.xlsx’,1,colIndex=c(1:5,7,8),colClasses=c(“character”,”integer”,”integer”,”double”,”double”,”integer”,”integer”))
data1$Type=as.factor(data1$Type)
data1$Code=as.factor(data1$Code)
data1$Day=as.factor(data1$Day)
data1$TypeNum=as.numeric(data1$Type)
data1$CodeNum=as.numeric(data1$Code)
data1$DayNum=as.numeric(data1$Day)

library(cluster)
distances <- daisy(data1[c(“Type”,”Code”,”Count”,”Avg”,”Max”,”Seconds”,”Day”)],stand=TRUE)
mycluster <- pam(distances,k=6)
plot(mycluster)

clusplot(mycluster)
clusplot(data1, mycluster$cluster, color=TRUE, shade=TRUE, labels=1, lines=0)

si <- silhouette(mycluster)
sidf = data.frame(observation=as.numeric(rownames(si)),cluster=as.factor(si[,c(“cluster”)]),sil_width=si[,c(“sil_width”)])

data1$observation <- as.numeric(rownames(data1)) 

data1 <- merge(data1,sidf,by=”observation”,all=TRUE)

library(“ggplot2”)
library(scales) #for date_breaks()
library(RColorBrewer)

ggplot(data1, aes(x=Max,colour=cluster)) +
geom_point(aes(y=Avg),size=3) +
scale_colour_brewer(palette=”Set1″) +
ylab(label=”Avg”) +
scale_x_continuous(limits = c(0, NA)) +
scale_y_continuous(limits = c(0, NA)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90))

ggsave(“c:/Temp/Max_Avg_50_5.png”, units=”mm”, width=190, height=134, dpi=600)

Oglasi
Ovaj unos je objavljen u Nekategorizirano. Bookmarkirajte stalnu vezu.

Komentiraj

Popunite niže tražene podatke ili kliknite na neku od ikona za prijavu:

WordPress.com Logo

Ovaj komentar pišete koristeći vaš WordPress.com račun. Odjava /  Izmijeni )

Google+ photo

Ovaj komentar pišete koristeći vaš Google+ račun. Odjava /  Izmijeni )

Twitter picture

Ovaj komentar pišete koristeći vaš Twitter račun. Odjava /  Izmijeni )

Facebook slika

Ovaj komentar pišete koristeći vaš Facebook račun. Odjava /  Izmijeni )

Spajanje na %s