0

I want to collapse duplicated rows, by unique record ID, in order to consolidate unique variables that exist on these duplicated rows. Certain variables are only listed on one version of the duplicate row, while other variables that are unique exist on a different row of the duplicated record. I'm working in R. I'd like to just have records exist on one row, without losing any of the unique columns. One "sum-total" row basically, that collects each of the columns that may have been filled on different rows, so that this final row is not a duplicate, and shows each variable that could have been filled all together...

I've looked into merge and bind, and I've thought about writing an If rule, but the duplication vary by record (see example)..

record  Var1  var2  var3  var4  var5
2     1     1    NA    NA    NA
2    NA    NA     1     1     1
3     2     2    NA    NA    NA
3    NA    NA     2    NA    NA
3    NA    NA    NA     2     2
4     1     1    NA    NA    NA
5    NA    NA     1     1     1
5    NA     2    NA    NA    NA

desired output example of record 2:

record  Var1  var2  var3  var4  var5
2     1     1    1    1    1
3 .... 
Sean M
  • 3
  • 4

1 Answers1

0

With base R's aggregate:

aggregate(df[2:ncol(df)], by = df["record"], sum, na.rm = T)

#### OUTPUT ####

  record Var1 var2 var3 var4 var5
1      2    1    1    1    1    1
2      3    2    2    2    2    2
3      4    1    1    0    0    0
4      5    0    2    1    1    1

With dplyr:

library(dplyr)

df %>% group_by(record) %>% summarize_all(sum, na.rm = T)


#### OUTPUT ####
# A tibble: 4 x 6
  record  Var1  var2  var3  var4  var5
   <int> <int> <int> <int> <int> <int>
1      2     1     1     1     1     1
2      3     2     2     2     2     2
3      4     1     1     0     0     0
4      5     0     2     1     1     1

The only thing is that NAs are turned into 0s. But it's easy to change them back.

  • No problem @SeanM. Please upvote the answer and accept it by clicking ✓ to the left so others know which answer worked for you. –  May 21 '19 at 17:36