I am trying to "unshuffle" the rows of a matrix containing the centroids of some clusters which are not in the same order as the order in which the samples were assigned to the clusters. Initially I was comparing the absolute value of the distance between the data points of the mean and the cluster centers and assign the index of the row which had the smallest distance. Of course, I am not allowed to have duplicate indexes. It worked pretty good but the symmetric values raise a problem (i.e., due to the absolute value for the distance, mirror clusters were not ordered properly). Also I tried to order them based on the variance, did not work as expected. I have been looking at the order() and sort() function and found an example which did not work.
order(mean)
order(mean)[centers]
sort(order(mean)[centers])
mean[sort(order(mean)[centers])]
I also tried the
apply(mean==centers,1,all)
but of course that just returns FALSE everywhere.
A sample of the matrices:
means <- c(0.055190097, 0.032412395, 0.015372307, -0.008012372,
-0.018736792, -0.078138715, -0.058707713, -0.044020629,
-0.023750329, -0.014402083, -0.069920581, -0.064429216,
-0.059913345, -0.052302253, -0.047874074, 0.050557395,
0.047246979, 0.044577065, 0.040384336, 0.038140009,
0.114954601, 0.108110051, 0.102531680, 0.093341425, 0.088140310)
dim(means) <- c(5,5)
means <- t(means)
centers <- c(-0.038754, -0.021588,-0.008851, 0.008579, 0.016579,
0.018371, 0.006095, -0.003026, -0.015537, -0.021286,
-0.078143, -0.069267, -0.062197, -0.051295, -0.045521,
0.033145, 0.033348, 0.033354, 0.032947, 0.032511,
0.115464, 0.105248, 0.097172, 0.084732, 0.078162)
dim(centers) <- c(5,5)
centers <- t(centers)
For instance (with the above example), line 2 from the means matrix corresponds to line 3 from the centers matrix as it is the closest in distance (data point wise). So, I have to find which line from the means corresponds to which line in centers (no duplicates). My matrices are bigger, but this should be enough as example Do you have any suggestions? Thank you