2

This is an extension to post Collapse / concatenate / aggregate a column to a single comma separated string within each group

Goal: aggregate multiple columns according to one grouping variable and separate individual values by separator of choice.

Reproducible example:

data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = c(rep(c(100), 3), rep(200,3)), C = rep(c(1,2,NA),2), D = c(15:20), E = rep(c(1,NA,NA),2))
data
    A   B  C  D  E
1 111 100  1 15  1
2 111 100  2 16 NA
3 111 100 NA 17 NA
4 222 200  1 18  1
5 222 200  2 19 NA
6 222 200 NA 20 NA

A is the grouping variable but B is still displayed in overall result (B depends on A in my application) and C, D and E are the variables to be collapsed into separated character strings.

Desired Output

    A   B  C    D         E
1 111 100  1,2  15,16,17  1
2 222 100  1,2  18,19,20  1    

I don't have a ton of experience with R. I did try to expand upon the solutions posted by G. Grothendieck to the linked post to meet my requirements but can't quite get it right for multiple columns.

What would be a proper implementation to get the desired output?

I focused specifically on group_by and summarise_all and aggregate in my attempts. They are a complete mess so I don't believe it would even be helpful to display.

EDIT: Solutions posted work great at displaying desired result! To continue improving the value in this post for those that find it.

How would it be possible for users to select their own separation characters. e.g. '-', '\n' The current solutions by @akrun and @tmfmnk both result in lists instead of a concatenated character string. Please correct me if I said this incorrectly.

data$D
[1] 15 16 17 18 19 20
> data$A
[1] 111 111 111 222 222 222
> data$B
[1] 100 100 100 200 200 200
> data$C
[1]  1  2 NA  1  2 NA
> data$D
[1] 15 16 17 18 19 20
> data$E
[1]  1 NA NA  1 NA NA
callmeaj04
  • 79
  • 7
  • Not sure what you mean at the end, since it seems like both their answers give strings, just strings with commas in them. If you want to try different separators or have some user interface, you can pass a variable as the collapse argument to `paste` or `str_c`. Beyond that, [this](https://stackoverflow.com/q/21644848/5325862) or the posts it links to should get you from doing one summary column to doing multiple at once – camille Feb 14 '20 at 22:00
  • Hi Camille, I was trying to pass the data to DT's datatable. However, it still resulted in a row per each occurrence but that might be a question specific to DT. I remember seeing at some point someone posting a question about exporting as csv vs as pdf and the format necessary for it. Even though it outputs correctly in the console the format is still different. – callmeaj04 Feb 17 '20 at 14:45

2 Answers2

2

With dplyr, you can do:

data %>%
 group_by(A, B) %>%
 summarise_all(~ toString(na.omit(.)))

      A     B C     D          E    
  <dbl> <dbl> <chr> <chr>      <chr>
1   111   100 1, 2  15, 16, 17 1    
2   222   200 1, 2  18, 19, 20 1 
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
2

We can group by 'A', 'B', and use summarise_at to paste all the non-NA elements

library(dplyr)
data %>% 
    group_by(A, B) %>%
    summarise_at(vars(-group_cols()), ~ toString(.[!is.na(.)]))
# A tibble: 2 x 5
# Groups:   A [2]
#      A     B C     D          E    
#  <dbl> <dbl> <chr> <chr>      <chr>
#1   111   100 1, 2  15, 16, 17 1    
#2   222   200 1, 2  18, 19, 20 1   

If we need to pass custom delimiter, use paste or str_c

library(stringr)
data %>% 
    group_by(A, B) %>%
    summarise_at(vars(-group_cols()), ~ str_c(.[!is.na(.)], collapse="_"))

Or using base R with aggregate

aggregate(. ~ A + B, data, FUN = function(x) 
      toString(x[!is.na(x)]), na.action = NULL)
akrun
  • 874,273
  • 37
  • 540
  • 662