0

When I cluster a dataset using MClust, I use the following code-

i = 2
print(paste("Number of clusters =", i))
cluster_model1 <- Mclust(cc[2:6], G=i)

When I repeat the clustering, the cluster classification (id) in each iteration can remain the same or it can change from 1 to 2 or 2 to 1. Is it possible to set the cluster id so that it does not change arbitrarily. I want to see how many times data from 10 imputed datasets belongs to cluster 1 or cluster 2. I can calculate this only if the cluster id remains the same.

The dataset cc has this data

head(cc[2:6])
              ea             pa           sa                en               pn
1             1.0            1.0          1.0               2.2              1.6
2             3.2            2.4          1.0               3.2              1.8
3             1.2            1.0          1.0               2.0              1.0
4             1.6            1.2          1.2               1.0              1.2
5             3.6            1.0          1.6               4.0              2.6
6             1.6            1.0          1.4               1.4              1.2

When I cluster, the classification could be

head(cluster_model1$classification)
[1] 2 1 2 1 1 1

or

head(cluster_model1$classification)
[1] 1 2 1 2 2 2

While the clustering results are correct, is it possible to set it as 2 1 2 1 1 1 every time the clustering is done.

Misha
  • 379
  • 1
  • 2
  • 12
  • Clustering depends on a bit of randomness to start the clusters so results aren't identical across runs. And there's no real meaning to what get's called group 1 vs group 2. You could give the clusters any names you want. you need some reference point to say whether or not points like in the same cluster as that reference point perhaps. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Oct 11 '21 at 19:24
  • @MrFlick I have added the dataset and the results. Is it possible to set a reference point in MClust so that the cluster id remains the same? – Misha Oct 11 '21 at 19:48
  • Remain the same as what exactly? What is your reference point. Do you have a known value that always should be in cluster "1"? – MrFlick Oct 11 '21 at 19:49
  • I do not set the reference point when using MClust. I do not see any way to set a reference point. This question is to inquire if there is another way to get reproducible cluster classification id. – Misha Oct 11 '21 at 19:56

0 Answers0