0

Goodmorning to everyone, I have a df composed like that

df=data.frame("Description"=c("Miriam","Miriam","Miriam","Trump","Trump","Trump","Right","Right","Right","Sara","Sara","Star","Star","Star","Sandra"))

I would like to creare a loop that create a new column in which is assigned a sample number to each sample with the same name, thus to obtain this result:

Description SampleID
Miriam  sample1
Miriam  sample1
Miriam  sample1
Trump   sample2
Trump   sample2
Trump   sample2
Right   sample3
Right   sample3
Right   sample3
Sara    sample4
Sara    sample4
Star    sample5
Star    sample5
Star    sample5
Sandra  sample6

Does anyone know how to do that? Thanks a lot to everyone will help. Andrea

Dr.PhilCol
  • 21
  • 5
  • 2
    There must be a dupe for this question... `paste0("sample", data.table::rleid(df$Description))` or `dplyr::group_indices(df, Description)` – zx8754 Apr 12 '19 at 09:26

3 Answers3

1

One dplyr possibility could be:

df %>%
 mutate(SampleID = paste0("sample", 
                   cumsum(Description != lag(Description, default = first(Description))) + 1))

   Description SampleID
1       Miriam  sample1
2       Miriam  sample1
3       Miriam  sample1
4        Trump  sample2
5        Trump  sample2
6        Trump  sample2
7        Right  sample3
8        Right  sample3
9        Right  sample3
10        Sara  sample4
11        Sara  sample4
12        Star  sample5
13        Star  sample5
14        Star  sample5
15      Sandra  sample6
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
1

We can use match to match the values in Description with all the unique values to create an unique ID and then paste the values with "Sample".

df$SampleID <- paste0("Sample", match(df$Description, unique(df$Description)))


df
#   Description SampleID
#1       Miriam  Sample1
#2       Miriam  Sample1
#3       Miriam  Sample1
#4        Trump  Sample2
#5        Trump  Sample2
#6        Trump  Sample2
#7        Right  Sample3
#8        Right  Sample3
#9        Right  Sample3
#10        Sara  Sample4
#11        Sara  Sample4
#12        Star  Sample5
#13        Star  Sample5
#14        Star  Sample5
#15      Sandra  Sample6
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

Your column is already factor (which is integer really = the levels of your factor), you just need to order the levels to make them match what you want and use as.numeric:

df$sampleID <- paste0("Sample", 
                      as.numeric(factor(df$Description, 
                                        levels=unique(df$Description), ordered=TRUE)))

df
#   Description sampleID
#1       Miriam  Sample1
#2       Miriam  Sample1
#3       Miriam  Sample1
#4        Trump  Sample2
#5        Trump  Sample2
#6        Trump  Sample2
#7        Right  Sample3
#8        Right  Sample3
#9        Right  Sample3
#10        Sara  Sample4
#11        Sara  Sample4
#12        Star  Sample5
#13        Star  Sample5
#14        Star  Sample5
#15      Sandra  Sample6

NB:

If you apply as.numeric on your column without doing anything else, you already get index for each name, just not in the order you want:

as.numeric(df$Description)
# [1] 1 1 1 6 6 6 2 2 2 4 4 5 5 5 3
Cath
  • 23,906
  • 5
  • 52
  • 86