2

I have a dataset that looks like this:

tmp = data.frame(ID = c(rep("001-0001", 2), rep("312-0013", 4), rep("507-5673", 3)), VALUE=rnorm(9, 0, 1))
tmp

        ID      VALUE
1 001-0001 -0.2061521
2 001-0001 -1.6680686
3 312-0013  0.4566101
4 312-0013  0.2336220
5 312-0013 -0.8951357
6 312-0013  0.1074477
7 507-5673  0.6178272
8 507-5673  0.2903965
9 507-5673  0.2669414

The first column "ID" is a character variable (subject IDs) which may repeat a few times for the same subject. I hope to use dplyr to automatically get the following dataset (i.e., mask the character variable by dummy numeric values):

tmp2 = data.frame(ID = c(rep(1, 2), rep(2, 4), rep(3, 3)), VALUE=rnorm(9, 0, 1))
tmp2

  ID      VALUE
1  1 -0.6441345
2  1 -1.0736110
3  2 -1.2887961
4  2 -0.4763107
5  2  1.2315772
6  2 -1.6617005
7  3  0.7437519
8  3  0.4047608
9  3 -0.3827181

Thanks!

alittleboy
  • 10,616
  • 23
  • 67
  • 107

1 Answers1

3

In base R, we can use match on the unique values of 'ID'

tmp$ID <- with(tmp, match(ID, unique(ID)))
tmp$ID
#[1] 1 1 2 2 2 2 3 3 3

With dplyr, cur_group_id can be used

library(dplyr)
tmp %>%
    group_by(ID) %>%
    mutate(ID = cur_group_id()) %>%
    ungroup
akrun
  • 874,273
  • 37
  • 540
  • 662