69

i have a dataframe that looks like this

> data <- data.frame(foo=c(1, 1, 2, 3, 3, 3), bar=c('a', 'b', 'a', 'b', 'c', 'd'))
> data
  foo bar
1   1   a
2   1   b
3   2   a
4   3   b
5   3   c
6   3   d

I would like to create a new column bars_by_foo which is the concatenation of the values of bar by foo. So the new data should look like this:

  foo bar bars_by_foo
1   1   a          ab
2   1   b          ab
3   2   a           a
4   3   b         bcd
5   3   c         bcd
6   3   d         bcd

I was hoping that the following would work:

p <- function(v) {
  Reduce(f=paste, x = v)
}
data %>% 
  group_by(foo) %>% 
  mutate(bars_by_foo=p(bar))

But that code gives me an error

Error: incompatible types, expecting a character vector.

What am I doing wrong?

crf
  • 1,810
  • 3
  • 15
  • 23

4 Answers4

160

You could simply do

data %>% 
     group_by(foo) %>% 
     mutate(bars_by_foo = paste0(bar, collapse = "")) 

Without any helper functions

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • 3
    Oh the collapse = "" is the difference! Without that it doesn't work, which is why I wrote the helper function. – crf Jul 21 '16 at 22:27
  • 2
    Also note that the collapse can be anything it, just can't be NULL. You can use collapse = " " or collapse = "," if you want. – Matt L. Feb 03 '18 at 22:32
  • 3
    In my use case, this returned all the rows, but wanted to keep only one, just throw in a `slice(1)`, before you undo the `group_by` – Amit Kohli Jun 08 '18 at 16:23
  • What to do if i want only one row for each foo, but all the foos summed? I tried >>>>data %>% group_by(foo) %>% summarise(sum(foo)) %>% mutate(bars_by_foo = paste0(bar, collapse = ""))<<<< but it gives me an error: x object 'bar' not found – Dutschke Oct 22 '20 at 11:44
  • 3
    @Dutschke not sure, perhaps `data %>% group_by(foo) %>% summarise(sum_foo = sum(foo), bars_by_foo = paste0(bar, collapse = ""))`? – David Arenburg Oct 22 '20 at 13:09
7

It looks like there's a bit of an issue with the mutate function - I've found that it's a better approach to work with summarise when you're grouping data in dplyr (that's no way a hard and fast rule though).

paste function also introduces whitespace into the result so either set sep = 0 or use just use paste0.

Here is my code:

p <- function(v) {
  Reduce(f=paste0, x = v)
}

data %>% 
    group_by(foo) %>% 
    summarise(bars_by_foo = p(as.character(bar))) %>%
    merge(., data, by = 'foo') %>%
    select(foo, bar, bars_by_foo)

Resulting in..

  foo bar bars_by_foo
1   1   a          ab
2   1   b          ab
3   2   a           a
4   3   b         bcd
5   3   c         bcd
6   3   d         bcd
989
  • 12,579
  • 5
  • 31
  • 53
plumbus_bouquet
  • 443
  • 6
  • 7
  • The use of summarise really sped up my similar operation. I wasn't doing any grouping, just concatenating the full column, so didn't think of it. Good call. Cut my paste0 down from 5+ minutes to ~3 secs. – Seth_P Apr 06 '17 at 08:50
  • 2
    The `collapse` argument of `paste` will be a much more efficient way to do `Reduce(paste())`. – Gregor Thomas Apr 17 '19 at 16:43
6

You can try this:

agg <- aggregate(bar~foo, data = data, paste0, collapse="")
df <- merge(data, agg, by = "foo", all = T)
colnames(df) <- c(colnames(data), "bars_by_foo") # optional


  # foo bar bars_by_foo
# 1   1     a    ab
# 2   1     b    ab
# 3   2     a     a
# 4   3     b   bcd
# 5   3     c   bcd
# 6   3     d   bcd
989
  • 12,579
  • 5
  • 31
  • 53
1

Your function works if you ensure that bar are all characters and not levels of a factor.

data <- data.frame(foo=c(1, 1, 2, 3, 3, 3), bar=c('a', 'b', 'a', 'b', 'c', 'd'),
stringsAsFactors = FALSE)

library("dplyr")

p <- function(v) {
  Reduce(f=paste, x = v)
 }

data %>% 
  group_by(foo) %>% 
  mutate(bars_by_foo=p(bar))


Source: local data frame [6 x 3]
Groups: foo [3]

   foo   bar bars_by_foo
  <dbl> <chr>       <chr>
    1     1     a     a b
    2     1     b     a b
    3     2     a       a
    4     3     b   b c d
    5     3     c   b c d
    6     3     d   b c d