How to keep all columns when concatenating rows with dplyr::summarise?

Question

I want to aggregate one column (C) in a data frame according to one grouping variable A, and separate the individual values by a comma while keeping all the other column B. However, B can either have a character (which is always the same for all the rows) or be empty. In this case, I would like to keep the character whenever it is present on one row.

Here is a simplified example:

data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = c("", "", "", "a" , "", "a"), C = c(5:10))
data

Based on this question Collapse / concatenate / aggregate a column to a single comma separated string within each group, I have the following code:

library(dplyr)
data %>%
  group_by(A) %>%
  summarise(test = toString(C)) %>%
  ungroup()

Here it is what I would like to obtain:

    A B  C
1 111    5,6,7
2 222 a  8,9,10

It is keeping also the B column. Can you provide the expected results? — tmfmnk, Jan 07 '20 at 13:26
I realised that my question was way too simplified. I've edited accordingly to make it more clear. Sorry to all for my mistake... — Euclides, Jan 07 '20 at 13:55

Louis · Answer 1 · 2020-01-07T14:32:38.733

Use summarise_all()

To keep all your columns, you can use summarise_all():

data %>% 
    group_by(A) %>% 
    summarise_all(toString)

# A tibble: 2 x 3
      A B       C       
  <dbl> <chr>   <chr>   
1   111 1, 2, 1 5, 6, 7 
2   222 2, 1, 2 8, 9, 10

Edit for updated question

You can add a B column to summarise to achieve the desided results:

data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = c("", "", "", "a" , "", "a"), C = c(5:10))
data

library(dplyr)
data %>%
  group_by(A) %>%
  summarise(B = names(sort(table(B),decreasing=TRUE))[1], 
            C = toString(C)) %>%
  ungroup()
# A tibble: 2 x 3
      A B     C       
  <dbl> <fct> <chr>   
1   111 ""    5, 6, 7 
2   222 a     8, 9, 10

This will return the most frequent value in B column (as order gives you ordered indexes).

Hope this helps.

This doesn't seem to fully work. The column `B` of the 2nd row is empty instead of `a` — Euclides, Jan 07 '20 at 14:32
@Euclides Yes, there was an error in the previous code. I've tested this and it works. — Louis, Jan 07 '20 at 14:33

score 1 · Accepted Answer · answered Jan 07 '20 at 14:09

You could write one function to return unique values

library(dplyr)

get_common_vars <- function(x) {
   if(n_distinct(x) > 1) unique(x[x !='']) else unique(x)
}

and then use it on all columns that you are interested :

data %>% 
  group_by(A) %>% 
  mutate(C = toString(C)) %>%
  summarise_at(vars(B:C), get_common_vars)
  #                  ^------ Include all columns here

# A tibble: 2 x 3
#     A  B     C       
#  <dbl> <fct> <chr>   
#1   111 ""    5, 6, 7 
#2   222 a     8, 9, 10

thanks for your answer! this does exactly what I need. – Euclides Jan 07 '20 at 14:30 — Euclides, Jan 07 '20 at 14:30

score 1 · Answer 3 · answered Jan 07 '20 at 14:50

You can also use the paste() function and leverage the collapse argument.

data %>% 
  group_by(A) %>% 
  summarise(
    B = paste(unique(B), collapse = ""),
    C = paste(C, collapse = ", "))

# A tibble: 2 x 3
  A     B     C       
  <chr> <chr> <chr>   
1 111   ""    5, 6, 7 
2 222   a     8, 9, 10

How to keep all columns when concatenating rows with dplyr::summarise?

3 Answers3

Use summarise_all()

Edit for updated question