0

I have data like this:

ID=c("A123","A123","A123","A123","B456","B456","B456","C789","C789")
item=c("bread", "butter", "milk", "eggs", "meat","milk", "peas", "peas", "meat")

df=data.frame(cbind(ID, item))
ID   item
1 A123  bread
2 A123 butter
3 A123   milk
4 A123   eggs
5 B456   meat
6 B456   milk
7 B456   peas
8 C789   peas
9 C789   meat

That I've transformed into a sparce matrix like this:

df_g = dcast(df,ID~item,fill=0, length)

df_g
    ID bread butter eggs meat milk peas
1 A123     1      1    1    0    1    0
2 B456     0      0    0    1    1    1
3 C789     0      0    0    1    0    1

How could I get the df_g-data to this "baskets" form?:

ID      basket
A123    bread,butter,eggs,milk
B456    meat,milk,peas
C789    meat,peas

I know it would be easy to transform df straight to "baskets" form. This is however not the solution. The order of the data changes when transforming it from df to df_g and this is why I need to get the "baskets" -form from the df_g -data.

ElinaJ
  • 791
  • 1
  • 6
  • 18
  • Are we talking `aggregate(item ~ ID, df, toString)`? – David Arenburg Apr 13 '16 at 16:13
  • Thanks David, but I would at this time need it specifically from the df_g form to the "baskets" -form. – ElinaJ Apr 13 '16 at 16:16
  • I feel like this an XY problem. So just don't convert to a wide format? Working over rows is much less efficient than over a single column – David Arenburg Apr 13 '16 at 16:22
  • You can sort while aggregating.. `aggregate(item ~ ID, df, function(x) toString(sort(x)))` Or using `data.table` you can sort and only then aggregate in a single code line `library(data.table) ; setDT(df)[order(item), toString(item), by = ID]` – David Arenburg Apr 13 '16 at 16:27
  • I need the wide format format for some intermediate calculations and after that I need the denser "baskets" -form for reporting purposes. The wide would not work for reporting because there is 1500 columns, when data is in approx 1-20 columns. – ElinaJ Apr 13 '16 at 16:28
  • 1
    Ok, suit yourself. Reopened. Eitherway, you could melt it back and then do the aggregation. Here's a `data.table` example `melt(setDT(df_g), 1L)[value == 1, toString(variable), by = ID]` – David Arenburg Apr 13 '16 at 16:29

1 Answers1

1

As others have said, if you don't have to be in the wide format, there are easier solutions. However, if your data must be in this format, the following should work:

df_g[df_g==0]<-NA
df_g2<-lapply(df_g[,-1],factor)
df_g2<-data.frame(Map(function (x,y) {levels(x)<-y ; return(x)},df_g2,names(df_g2)))
df_g$basket<-apply(df_g2,1,function (x) paste(na.omit(x),sep=',',collapse=','))

df_g
    ID bread butter eggs meat milk peas                 basket
1 A123     1      1    1   NA    1   NA bread,butter,eggs,milk
2 B456    NA     NA   NA    1    1    1         meat,milk,peas
3 C789    NA     NA   NA    1   NA    1              meat,peas
Community
  • 1
  • 1
Craig
  • 4,492
  • 2
  • 19
  • 22