1

I would like to add a column that indicates if there are 2 duplicates observed in 2 rows in a dataframe. Let me give an example. Here is sample data.

df
id     date         n   var
01-05  12-04-1985   1   sleep
02-06  11-04-2000   10  epoch
01-05  12-04-1985   1   epoch
12-10  01-08-2010   4   sleep

Note that rows 1 and 3 have the same id and date. I want to create an indicator in a new column (var) that has "sleep/epoch" for instances where the rows with id and date match, like so:

df
id     date         n    var  
01-05  12-04-1985   1    sleep/epoch  
02-06  11-04-2000   10   epoch
01-05  12-04-1985   1    sleep/epoch
12-10  01-08-2010   4    sleep

I tried to do this using ifelse, but I had trouble doing so. Any advice on getting this to work?

D. Fowler
  • 601
  • 3
  • 7
  • Possible duplicate https://stackoverflow.com/questions/15933958/collapse-concatenate-aggregate-a-column-to-a-single-comma-separated-string-w – Ronak Shah Dec 03 '20 at 03:37

1 Answers1

0

We can use a group by operation i.e. grouping by 'id', 'date', paste the unique values in 'var' by collapseing into a single string separated by "/"

library(dplyr)
library(stringr)
df1 <- df %>%
    group_by(id, date) %>% 
    mutate(var = str_c(unique(var), collapse="/")) %>%
    ungroup

-output

df1
# A tibble: 4 x 4
#  id    date           n var        
#  <chr> <chr>      <int> <chr>      
#1 01-05 12-04-1985     1 sleep/epoch
#2 02-06 11-04-2000    10 epoch      
#3 01-05 12-04-1985     1 sleep/epoch
#4 12-10 01-08-2010     4 sleep      

Or using base R with ave

df$var <- with(df, ave(var, id, date, FUN = function(x) 
            paste(unique(x), collapse="/")))

data

df <- structure(list(id = c("01-05", "02-06", "01-05", "12-10"), 
date = c("12-04-1985", 
"11-04-2000", "12-04-1985", "01-08-2010"), n = c(1L, 10L, 1L, 
4L), var = c("sleep", "epoch", "epoch", "sleep")), 
class = "data.frame", row.names = c(NA, 
-4L))
akrun
  • 874,273
  • 37
  • 540
  • 662