0

I'm trying to discretize a numerical variable using Kmeans. It worked pretty well but I'm wondering how I can find the intervals in my cluster.

I work with FactoMineR to do my kmeans. I found 3 clusters according to the following graph : enter image description here

My point now is to identify the intervals of my numerical variable within the clusters.

Is there any option or method in FactoMineR or other package to do it ? I can do it manually but as I have to do it for a certain amount of variables, I'd like to found an easy way to identify them.

Laurent Magon
  • 183
  • 1
  • 10
  • 2
    Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Axeman Dec 15 '17 at 10:53

1 Answers1

0

Since you have not provided data I have used the example from the kmeans documentation, which produces two groups for data with two columns x and y. You may split the original data by the cluster each row belongs to and then extract data from each group. I am not sure if my example data resembles your data, but in below code I have simply used the difference between min value of column x and max value of column y as the boundaries of a potential interval (depending on the use case this makes sense or not). Does that help you?

data <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(data) <- c("x", "y")

cl <- kmeans(data, 2)

data <- as.data.frame(cbind(data, cluster = cl$cluster))

lapply(split(data,  data$cluster), function(x) {
  min_x <- min(x$x)
  max_y <- max(x$y)
  diff <- max_y-min_x
  c(min_x = min_x , max_y = max_y, diff = diff) 
})

# $`1`
# min_x      max_y       diff 
# -0.6906124  0.5123950  1.2030074 
# 
# $`2`
# min_x     max_y      diff 
# 0.2052112 1.6941800 1.4889688
Manuel Bickel
  • 2,156
  • 2
  • 11
  • 22
  • Oh thank you ! It seems to be exactly what I was looking for. I thought of something similar but I did'nt use the `split`. – Laurent Magon Dec 15 '17 at 12:10
  • As an additional hint you might use `str(cl)` (`cl` from above code example) to check the content of your `kmeans` ouput. Maybe there is more that you can use for your analysis. – Manuel Bickel Dec 15 '17 at 12:13