3

I have some results cluster labels from kmeans done on different ids (reprex example below). the problem is the kmeans clusters codes are not ordered consistently across ids although all ids have 3 clusters.

reprex = data.frame(id = rep(1:2, each = 41, 
           v1 = rep(seq(1:4), 2),
           cluster = c(2,2,1,3,3,1,2,2))

reprex
   id v1 cluster
1  1  1       2
2  1  2       2
3  1  3       1
4  1  4       3
5  2  1       3
6  2  2       1
7  2  3       2
8  2  4       2

what I want is that the variable cluster should always start with 1 within each ID. Note I don't want to reorder that dataframe by cluster, the order needs to remain the same. so the desired result would be:

reprex_desired<- data.frame(id = rep(1:2, each = 4), 
           v1 = rep(seq(1:4), 2),
           cluster = c(2,2,1,3,3,1,2,2),
           what_iWant = c(1,1,2,3,1,2,3,3))

reprex_desired
  id v1 cluster what_iWant
1  1  1       2          1
2  1  2       2          1
3  1  3       1          2
4  1  4       3          3
5  2  1       3          1
6  2  2       1          2
7  2  3       2          3
8  2  4       2          3

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
Myriad
  • 341
  • 1
  • 8

2 Answers2

3

We can use match after grouping by 'id'

library(dplyr)
reprex <- reprex %>%
     group_by(id) %>% 
     mutate(what_IWant = match(cluster, unique(cluster))) %>%
     ungroup

-output

reprex
# A tibble: 8 × 4
     id    v1 cluster what_IWant
  <int> <int>   <dbl>      <int>
1     1     1       2          1
2     1     2       2          1
3     1     3       1          2
4     1     4       3          3
5     2     1       3          1
6     2     2       1          2
7     2     3       2          3
8     2     4       2          3
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Here is a version with cumsum combined with lag:

library(dplyr)
df %>% 
  group_by(id) %>% 
  mutate(what_i_want = cumsum(cluster != lag(cluster, def = first(cluster)))+1)
     id    v1 cluster what_i_want
  <int> <int>   <dbl>       <dbl>
1     1     1       2           1
2     1     2       2           1
3     1     3       1           2
4     1     4       3           3
5     2     1       3           1
6     2     2       1           2
7     2     3       2           3
8     2     4       2           3
TarJae
  • 72,363
  • 6
  • 19
  • 66