1

I have one data frame, I want to find the rows where both columns A and B are duplicated, and then combine the rows by combing the elements in C column together.

My example:

 DF = cbind.data.frame(A = c(1, 1, 2, 3, 3), 
                       B = c("a", "b", "a", "c", "c"), 
                       C = c("M", "N", "X", "M", "N"))

My expected result:

 DFE = cbind.data.frame(A = c(1, 1, 2, 3), 
                        B = c("a", "b", "a", "c"), 
                        C = c("M", "N", "X", "M; N"))

Thanks a lot

Wang
  • 1,314
  • 14
  • 21

2 Answers2

3

Without packages:

DF <- aggregate(C ~ A + B, FUN = function(x) paste(x, collapse = "; "), data = DF)

Output:

  A B    C
1 1 a    M
2 2 a    X
3 1 b    N
4 3 c M; N

Or with data.table:

setDT(DF)[, .(C = paste(C, collapse = "; ")), by = .(A, B)]
arg0naut91
  • 14,574
  • 2
  • 17
  • 38
2

This is a tidyverse based solution where you can use paste with collapse after grouping it.

library(dplyr)
DF = cbind.data.frame(A = c(1, 1, 2, 3, 3), 
                      B = c("a", "b", "a", "c", "c"), 
                      C = c("M", "N", "X", "M", "N"))


DFE = cbind.data.frame(A = c(1, 1, 2, 3), 
                       B = c("a", "b", "a", "c"), 
                       C = c("M", "N", "X", "M; N"))


DF %>% 
  group_by(A,B) %>% 
  summarise(C = paste(C, collapse = ";"))
#> # A tibble: 4 x 3
#> # Groups:   A [3]
#>       A B     C    
#>   <dbl> <fct> <chr>
#> 1     1 a     M    
#> 2     1 b     N    
#> 3     2 a     X    
#> 4     3 c     M;N

Created on 2019-03-19 by the reprex package (v0.2.1)

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
amrrs
  • 6,215
  • 2
  • 18
  • 27