Find rows in data frame with certain columns are duplicated, then combine the the elements in other columns

Question

I have one data frame, I want to find the rows where both columns A and B are duplicated, and then combine the rows by combing the elements in C column together.

My example:

 DF = cbind.data.frame(A = c(1, 1, 2, 3, 3), 
                       B = c("a", "b", "a", "c", "c"), 
                       C = c("M", "N", "X", "M", "N"))

My expected result:

 DFE = cbind.data.frame(A = c(1, 1, 2, 3), 
                        B = c("a", "b", "a", "c"), 
                        C = c("M", "N", "X", "M; N"))

Thanks a lot

arg0naut91 · Answer 1 · 2019-03-19T13:16:13.987

3

Without packages:

DF <- aggregate(C ~ A + B, FUN = function(x) paste(x, collapse = "; "), data = DF)

Output:

  A B    C
1 1 a    M
2 2 a    X
3 1 b    N
4 3 c M; N

Or with data.table:

setDT(DF)[, .(C = paste(C, collapse = "; ")), by = .(A, B)]

edited Mar 19 '19 at 13:16

answered Mar 19 '19 at 13:09

arg0naut91

14,574
2
17
38

score 2 · Accepted Answer · edited Mar 19 '19 at 13:10

This is a tidyverse based solution where you can use paste with collapse after grouping it.

library(dplyr)
DF = cbind.data.frame(A = c(1, 1, 2, 3, 3), 
                      B = c("a", "b", "a", "c", "c"), 
                      C = c("M", "N", "X", "M", "N"))


DFE = cbind.data.frame(A = c(1, 1, 2, 3), 
                       B = c("a", "b", "a", "c"), 
                       C = c("M", "N", "X", "M; N"))


DF %>% 
  group_by(A,B) %>% 
  summarise(C = paste(C, collapse = ";"))
#> # A tibble: 4 x 3
#> # Groups:   A [3]
#>       A B     C    
#>   <dbl> <fct> <chr>
#> 1     1 a     M    
#> 2     1 b     N    
#> 3     2 a     X    
#> 4     3 c     M;N

^{Created on 2019-03-19 by the reprex package (v0.2.1)}

Find rows in data frame with certain columns are duplicated, then combine the the elements in other columns

2 Answers2