8

I have a data frame which contains a customerid, and a list. I would like to merge those list pertaining to the same customer.

library(plyr)
subsets <- list(c("a", "d", "e"), c("a", "b", "c", "e"))
customerids <- c(1,1)
transactions <- data.frame(customerid = customerids,subset =I(subsets))
> transactions
  customerid     subset
1          1    a, d, e
2          1 a, b, c, e

If I want to merge the subsets with ddply, I get an expanded result

> ddply(transactions, .(customerid), summarise, subset=Reduce(union,subset))
  customerid subset
1          1   a
2          1   d
3          1   e
4          1   b
5          1   c

while I would have expected all the results in 1 row.

lokheart
  • 23,743
  • 39
  • 98
  • 169
nicolas
  • 9,549
  • 3
  • 39
  • 83
  • 1
    The step creating the dataframe throws an error. You probably created this differently and should post `dput(transactions)`. I don't think dataframes hold list objects very well. There's a well-known difficulty with POSIXlt objects in dataframes as well. – IRTFM Jul 15 '13 at 21:25
  • indeed I copied a wrong input( no I operator), this is fixed. – nicolas Jul 15 '13 at 21:28
  • +1 for the `I` creating the list element within the data.frame. – agstudy Jul 15 '13 at 21:30
  • @agstudy, that was his [earlier question](http://stackoverflow.com/q/17662596/559784) today. – Arun Jul 15 '13 at 21:32
  • @Arun I see...so +1 for you also! – agstudy Jul 15 '13 at 21:33
  • @nicolas I would transform, my data to a list in this case `by(transactions$subset,transactions$customerid,unlist)` – agstudy Jul 15 '13 at 21:39

1 Answers1

4

You can do something like this:

ddply(transactions, .(customerid), function(x) 
            data.frame(subset=I(list(unlist(x$subset)))))

Edit: I'm not sure I follow your comments. But if you want just unique values within each customerid for subset then:

ddply(transactions, .(customerid), function(x) 
            data.frame(subset=I(list(unique(unlist(x$subset))))))
Arun
  • 116,683
  • 26
  • 284
  • 387
  • good to know. makes sense when looking at ddply implementation – nicolas Jul 15 '13 at 21:36
  • actually, union does not keep duplicates, so 'list' should be 'unique' for reference. thanks ! – nicolas Jul 15 '13 at 21:38
  • actually, this simple modification seems to be not trivial... so I have to remove the mark. – nicolas Jul 15 '13 at 22:11
  • basketByCustomer2 <- ddply(basketByCustomer, .(customerid), function(x) { data.frame(subset=I(unique(x$subset)))}) – nicolas Jul 15 '13 at 22:17
  • 1
    you're calling `unique` on a `list` here in your last comment. That'll return only unique lists, not unique elements within the lists... – Arun Jul 15 '13 at 22:21
  • indeed. sorry for the confusion, and thanks for the updated answer. – nicolas Jul 19 '13 at 15:49