0

I want to aggregate one column (C) in a data frame according to one grouping variable A, and separate the individual values by a comma while keeping all the other column B. However, B can either have a character (which is always the same for all the rows) or be empty. In this case, I would like to keep the character whenever it is present on one row.

Here is a simplified example:

data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = c("", "", "", "a" , "", "a"), C = c(5:10))
data

Based on this question Collapse / concatenate / aggregate a column to a single comma separated string within each group, I have the following code:

library(dplyr)
data %>%
  group_by(A) %>%
  summarise(test = toString(C)) %>%
  ungroup()

Here it is what I would like to obtain:

    A B  C
1 111    5,6,7
2 222 a  8,9,10
Euclides
  • 99
  • 1
  • 7
  • It is keeping also the B column. Can you provide the expected results? – tmfmnk Jan 07 '20 at 13:26
  • I realised that my question was way too simplified. I've edited accordingly to make it more clear. Sorry to all for my mistake... – Euclides Jan 07 '20 at 13:55

3 Answers3

2

Use summarise_all()

To keep all your columns, you can use summarise_all():

data %>% 
    group_by(A) %>% 
    summarise_all(toString)

# A tibble: 2 x 3
      A B       C       
  <dbl> <chr>   <chr>   
1   111 1, 2, 1 5, 6, 7 
2   222 2, 1, 2 8, 9, 10

Edit for updated question

You can add a B column to summarise to achieve the desided results:

data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = c("", "", "", "a" , "", "a"), C = c(5:10))
data

library(dplyr)
data %>%
  group_by(A) %>%
  summarise(B = names(sort(table(B),decreasing=TRUE))[1], 
            C = toString(C)) %>%
  ungroup()
# A tibble: 2 x 3
      A B     C       
  <dbl> <fct> <chr>   
1   111 ""    5, 6, 7 
2   222 a     8, 9, 10

This will return the most frequent value in B column (as order gives you ordered indexes).

Hope this helps.

Louis
  • 3,592
  • 2
  • 10
  • 18
1

You could write one function to return unique values

library(dplyr)

get_common_vars <- function(x) {
   if(n_distinct(x) > 1) unique(x[x !='']) else unique(x)
}

and then use it on all columns that you are interested :

data %>% 
  group_by(A) %>% 
  mutate(C = toString(C)) %>%
  summarise_at(vars(B:C), get_common_vars)
  #                  ^------ Include all columns here

# A tibble: 2 x 3
#     A  B     C       
#  <dbl> <fct> <chr>   
#1   111 ""    5, 6, 7 
#2   222 a     8, 9, 10
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

You can also use the paste() function and leverage the collapse argument.

data %>% 
  group_by(A) %>% 
  summarise(
    B = paste(unique(B), collapse = ""),
    C = paste(C, collapse = ", "))

# A tibble: 2 x 3
  A     B     C       
  <chr> <chr> <chr>   
1 111   ""    5, 6, 7 
2 222   a     8, 9, 10
Wil
  • 3,076
  • 2
  • 12
  • 31