0

I have a dataframe with a set of character strings in one column, and a grouping variable (a string, but could be a factor) in another. I'd like to collapse the dataframe such that the strings are collapsed into elements by grouping-variable. For info, I'm then going to use Corpus(VectorSource(x)) on that vector (i.e., I'm collapsing to create documents).

So for example:

    eg           Type
1   tomato        F 
2   mushrooms     F
3   snow          W
4   chips         F
5   rain          W

This would be converted into a character vector with two elements, the members of 'W' and the members of 'F'. I know I can use:

a <- paste(x$eg,collapse=" ")

To get all of them and of course just manually create subsets (or loop). I was wondering if there was a plyr function (but couldn't see one), and I think tapply or by might be what I'm looking for (in base) but I'm not clear how they'd be used here.

I'm not looking to output a dataframe here, but exploring the flagged duplicates clearly those methods apply to this question.

sjgknight
  • 393
  • 1
  • 5
  • 19
  • 1
    Other alternatives using `dplyr`, `data.table` etc [**here**](http://stackoverflow.com/questions/26981385/r-collapse-all-columns-by-an-id-column/26981611#26981611) – Henrik Jan 23 '15 at 13:51

2 Answers2

1

Just found an answer, this should work from the plyr package:

a <- vaggregate(x$eg,x$Type,function(y) paste0(y,collapse=" "))

EDIT: See comments below - the function(y) is superfluous, and this can be done from base

sjgknight
  • 393
  • 1
  • 5
  • 19
1

Answer using data.table package:

> dt <- data.table(eg = letters[1:8], Type=rep(c("F","W"), 4))
> a <- dt[, paste(eg, collapse=" "), by=Type]
> a
   Type      V1
1:    F a c e g
2:    W b d f h

The bonus of using data.table is that this will still run in a few seconds even if you get up to millions of rows.

LauriK
  • 1,899
  • 15
  • 20
  • 1
    Thanks @Laurik I should explore data.table more – sjgknight Jan 23 '15 at 14:16
  • 1
    @sjgknight, you might find the [new vignettes](https://github.com/Rdatatable/data.table/issues/944) helpful (other vignettes in preparation). – Arun Jan 25 '15 at 01:01