2

I would like to take a data frame and collapse the rows on a column and simply create another column that is a vector of all the values.

For example I would like to transform a data frame like this:

id  item
1   100
1   103
1   109
1   101
2   102
2   109
2   107
2   105
3   105
3   106
3   101
3   102
3   110

To:

id  item
1   (100,103,109,101)
2   (102,109,107,105)
3   (105,106,101,102,110)

So the first column is the unique id and the second column is a vector/list of all the items that were seen for that id. Seems like this should be easy but I haven't been able to find a solution. As noted in the example the amount of items can vary per id.

Here is code to create the initial data frame I am trying to transform.

id <- c(1,1,1,1,2,2,2,2,3,3,3,3,3)
item <- c(100,103,109,101,102,109,107,105,105,106,101,102,110)
input_frame <- data.frame(cbind(id,item))
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
MMark
  • 21
  • 1
  • A similar question is https://stackoverflow.com/questions/26981385/r-collapse-all-columns-by-an-id-column although that one asks how to collapse _all_ columns except the grouping column – talat May 26 '15 at 19:02

4 Answers4

5

You can use aggregate function to group by id and then collect the respective elements from item using c function:

aggregate(formula = item ~ id, data = input_frame, FUN = c)

Output:

   id                    item
1  1      100, 103, 109, 101
2  2      102, 109, 107, 105
3  3 105, 106, 101, 102, 110
B.Shankar
  • 1,271
  • 7
  • 11
0
library (plyr)
ddply(input_frame,.(id),summarize,val=paste(item,collapse=","))
  id                 val
1  1     100,103,109,101
2  2     102,109,107,105
3  3 105,106,101,102,110
Sandy Muspratt
  • 31,719
  • 12
  • 116
  • 122
Alexey Ferapontov
  • 5,029
  • 4
  • 22
  • 39
0

This is a solution that produces a nested list:

item_list <- lapply(unique(id), function(i)
                   list(id=i,item=input_frame$item[id==i]) )

You can access the items for, say, id==2 by

item_list[[2]]$item
Stibu
  • 15,166
  • 6
  • 57
  • 71
0
library(dplyr)
summarise(group_by(input_frame,id),item=paste(item,collapse=','))
Shenglin Chen
  • 4,504
  • 11
  • 11