Identify duplicates and make column with common id

Question

I have a df

df <- data.frame(ID = c('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'),
                 var1 = c(1, 1, 3, 4, 5, 5, 7, 8),
                 var2 = c(1, 1, 0, 0, 1, 1, 0, 0),
                 var3 = c(50, 50, 30, 47, 33, 33, 70, 46))

Where columns var1 - var3 are numerical inputs into a modelling software. To save on computing time, I would like to simulate unique instances of var1 - var3 in the modelling software, then join the results back to the main dataframe using leftjoin.

I need to add a second identifier to each row to show that it is the same as another row in terms of var1-var3. The output would be like:

  ID var1 var2 var3 ID2
1  a    1    1   50 ab
2  b    1    1   50 ab
3  c    3    0   30 c
4  d    4    0   47 d
5  e    5    1   33 ef
6  f    5    1   33 ef
7  g    7    0   70 g
8  h    8    0   46 h

The I can subset unique rows of var1-var3 and ID2 simulate them in the software, and join results back to the main df using the new ID2.

score 2 · Answer 1 · answered Jan 31 '23 at 10:41

With paste:

library(dplyr) #1.1.0
df %>%
  mutate(ID2 = paste(unique(ID), collapse = ""), 
         .by = c(var1, var2))

#   ID var1 var2 var3 ID2
# 1  a    1    1   50  ab
# 2  b    1    1   50  ab
# 3  c    3    0   30   c
# 4  d    4    0   47   d
# 5  e    5    1   33  ef
# 6  f    5    1   33  ef
# 7  g    7    0   70   g
# 8  h    8    0   46   h

Note that the .by argument is a new feature of dplyr 1.1.0. You can still use group_by and ungroup with earlier versions and/or if you have a more complex pipeline.

This ended up working: df %>% group_by(var1, var2) %>% mutate(new_ID = paste0(ID, collapse = "")) — Loz, Jan 31 '23 at 10:53

Identify duplicates and make column with common id

1 Answers1