2

I am trying to convert data from multiple rows in a data frame to a list (or similar structure).

My data looks like this:

data.frame("a"=c(1,1,2,3,3,3), "b"=c("x","y","x","x","y","z"))
  a b
1 1 x
2 1 y
3 2 x
4 3 x
5 3 y
6 3 z

and the result I'm looking for is something like this:

  a       b
1 1    x, y
2 2       x
3 3 x, y, z

I can do this inefficiently by looping over all rows of the dataframe and appending to individual lists, but I wanted to see if there was a better way of doing this (I am currently studying the data.table package and I believe it contains a solution for this, but I haven't found it yet)

Thanks for your help!

yarbaur
  • 75
  • 8

1 Answers1

0

We can use aggregate

aggregate(b~a, df, FUN=toString)
#  a       b
#1 1    x, y
#2 2       x
#3 3 x, y, z

Or with data.table, we convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'a', paste the elements of 'b' together (toString is a wrapper for paste(..., collapse=", "))

library(data.table)
setDT(df)[, list(b= toString(b)), a]     
akrun
  • 874,273
  • 37
  • 540
  • 662