is there an R function to collapse duplicated rows while combining unique columns within these duplicated rows?

Question

I want to collapse duplicated rows, by unique record ID, in order to consolidate unique variables that exist on these duplicated rows. Certain variables are only listed on one version of the duplicate row, while other variables that are unique exist on a different row of the duplicated record. I'm working in R. I'd like to just have records exist on one row, without losing any of the unique columns. One "sum-total" row basically, that collects each of the columns that may have been filled on different rows, so that this final row is not a duplicate, and shows each variable that could have been filled all together...

I've looked into merge and bind, and I've thought about writing an If rule, but the duplication vary by record (see example)..

record  Var1  var2  var3  var4  var5
2     1     1    NA    NA    NA
2    NA    NA     1     1     1
3     2     2    NA    NA    NA
3    NA    NA     2    NA    NA
3    NA    NA    NA     2     2
4     1     1    NA    NA    NA
5    NA    NA     1     1     1
5    NA     2    NA    NA    NA

desired output example of record 2:

record  Var1  var2  var3  var4  var5
2     1     1    1    1    1
3 ....

Please edit the desired output into your question - don't hide it in a comment. — Gregor Thomas, May 21 '19 at 17:15
Best guess, with `dplyr`, `your_data %>% group_by(record) %>% summarize_all(sum, na.rm = T)` — Gregor Thomas, May 21 '19 at 17:16
@gregor, I will try dplyr. what is your usage of %>% meaning here/doing in the code? thanks. — Sean M, May 21 '19 at 17:21
[What does %>% do in R?](https://stackoverflow.com/q/27125672/903061) — Gregor Thomas, May 21 '19 at 18:10

score 0 · Accepted Answer · answered May 21 '19 at 17:28

With base R's aggregate:

aggregate(df[2:ncol(df)], by = df["record"], sum, na.rm = T)

#### OUTPUT ####

  record Var1 var2 var3 var4 var5
1      2    1    1    1    1    1
2      3    2    2    2    2    2
3      4    1    1    0    0    0
4      5    0    2    1    1    1

With dplyr:

library(dplyr)

df %>% group_by(record) %>% summarize_all(sum, na.rm = T)


#### OUTPUT ####
# A tibble: 4 x 6
  record  Var1  var2  var3  var4  var5
   <int> <int> <int> <int> <int> <int>
1      2     1     1     1     1     1
2      3     2     2     2     2     2
3      4     1     1     0     0     0
4      5     0     2     1     1     1

The only thing is that NAs are turned into 0s. But it's easy to change them back.

No problem @SeanM. Please upvote the answer and accept it by clicking ✓ to the left so others know which answer worked for you. — , May 21 '19 at 17:36

is there an R function to collapse duplicated rows while combining unique columns within these duplicated rows?

1 Answers1