0

I have a dataset as follows:

col1    col2
a        1
a        2
b        1
b        3
c        4

I want the output as follows:

col1    col2
a        1,2
b        1,3
c        4

How is it possible in R?

Nadeem Hussain
  • 219
  • 4
  • 16

1 Answers1

2

We can group by 'col1' and paste the 'col2' with collapse=',' option. A convenient wrapper would be toString. This can be done with any of the aggregate by group functions. For example, with data.table, we convert 'data.frame' to 'data.table' (setDT(df1)) and use the logic as described above

library(data.table)
setDT(df1)[, list(col2 = toString(col2)), by = col1]

Or with aggregate from base R

aggregate(col2~col1, df1, FUN=toString)

If you need a list output for 'col2'

aggregate(col2~col1, df1, FUN=I)

Or using dplyr

library(dplyr)
df1 %>% 
     group_by(col1) %>% 
     summarise(col2= toString(col2))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • bow to master but which of this is fast and memory efficient – The6thSense Jun 02 '15 at 13:25
  • 1
    @VigneshKalai Thanks for the comments. `aggregate` would be slow for big datasets. I think both `dplyr` and `data.table` performs similar though I give some edge to `data.table` (not benchmarked though) – akrun Jun 02 '15 at 13:28
  • 2
    @VigneshKalai, Don't take offence, but instead of asking which approach would be fast and memory efficient on different questions, you can try to explore this on your own. Please take a look at the `microbenchmark` package, and read http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example to understand how to generate simulated datasets. Then with these simulated datasets, run benchmarks to come to your own conclusions. Doing so would also help you understand R and common packages better. – A5C1D2H2I1M1N2O1R2T1 Jun 02 '15 at 13:58
  • Thank @AnandaMahto it really helped a lot and it is in no way an offense then how would I learn if don't do the dirty works – The6thSense Jun 03 '15 at 07:46