How to replace duplicate row values by appending indexes in R using dplyr?

Question

I would like to replace duplicate row values in a given column by appending an underscore with an index based on their incidence. For example

old_df_col new_df_col
object object_1
object object_2
object object_3
object object_4

Most other questions focus around deleting or replacing duplicate values with NA so I wasn't able to find an implementation using R and dplyr.

Here's what I've worked out so far

# count duplicates
mtcars %>% group_by(carb) %>% summarize(n=n())

# filter duplicates
mtcars %>% group_by(carb) %>% filter(n()>1)

I think you are looking for this https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame — Ronak Shah, Jul 11 '21 at 11:31

score 2 · Accepted Answer · edited Jul 11 '21 at 20:08

You can group by the target variable and use row_number() to create the sequence.

Clearly, you might have to sort the data set previously (using arrange()) so that the sequence has some meaning for your data, but is not strictly necessary.

library(dplyr)

mtcars %>% group_by(carb) %>% 
  arrange(carb, cyl, mpg, hp) %>% 
  mutate(
    carb_seq = paste("carb", carb, "seq", row_number(), sep = "_")
  )
# A tibble: 32 x 12
# Groups:   carb [6]
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb carb_seq    
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>       
 1  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1 carb_1_seq_1
 2  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1 carb_1_seq_2
 3  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1 carb_1_seq_3
 4  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1 carb_1_seq_4
 5  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1 carb_1_seq_5
 6  18.1     6 225     105  2.76  3.46  20.2     1     0     3     1 carb_1_seq_6
 7  21.4     6 258     110  3.08  3.22  19.4     1     0     3     1 carb_1_seq_7
 8  21.4     4 121     109  4.11  2.78  18.6     1     1     4     2 carb_2_seq_1
 9  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2 carb_2_seq_2
10  24.4     4 147.     62  3.69  3.19  20       1     0     4     2 carb_2_seq_3
# … with 22 more rows

^{Created on 2021-07-11 by the reprex package (v2.0.0)}

How to replace duplicate row values by appending indexes in R using dplyr?

1 Answers1