0

I have a data frame A in the following format

user         item
10000000     1      # each user is a 8 digits integer, item is up to 5 digits integer
10000000     2
10000000     3
10000001     1
10000001     4
..............

What I want is a list B, with users' names as the name of list elements, list element is a vector of items corresponding to this user.

e.g

B = list(c(1,2,3),c(1,4),...)    

I also need to paste names to B. To apply association rule learning, items need to be convert to characters

Originally I used tapply(A$user,A$item, c), this makes it not compatible with association rule package. See my post:

data format error in association rule learning R

But @sgibb's solution seems also generates an array, not a list.

library("arules")
temp <- as(C, "transactions")    # C is output using @sgibb's solution

throws error: Error in as(C, "transactions") : 
no method or default for coercing “array” to “transactions”
Community
  • 1
  • 1
Jin
  • 1,203
  • 4
  • 20
  • 44
  • Please please please use `dput` to share your data. [See here for reasons and more details](http://stackoverflow.com/q/5963269/903061), it makes it much easier to help. – Gregor Thomas Apr 05 '14 at 22:21
  • `?dlply` or `?tapply` – hrbrmstr Apr 05 '14 at 22:26
  • Also, in your previous question you mentioned `split`. See `split(A$item, A$user)` – alexis_laz Apr 05 '14 at 22:27
  • @alexis_laz, you are right. It made me whole afternoon trying to dig out the bug – Jin Apr 05 '14 at 22:36
  • @alexis_laz, would you provide your solution? I am really tired after whole afternoon trying. – Jin Apr 05 '14 at 22:53
  • alexis_laz, are we applying association rule correctly if we directly use split function without "tapply", i.e, correct items together – Jin Apr 05 '14 at 23:23
  • 1
    @Jin the output of `tapply` and `split` is the same. The only difference is `class(tapply(...)) == "array"` and `class(split(...)) == "list")`. – sgibb Apr 05 '14 at 23:27
  • I see. Thanks a lot to both alexis_laz and sgibb – Jin Apr 05 '14 at 23:29
  • @alexis_laz, your solution split(A$item, A$user) will create duplicated items in a lot of users' itemList, how to remove these duplicates? I already tried loop combined with unique, but failed. Thanks – Jin Apr 06 '14 at 00:07
  • 1
    Perhaps, try something like `lapply(split(A$item, A$user), unique)`. Should there be duplicated items, though? If not, maybe you 've made a miscalculation somewhere when building `A`? I only say this, because neither `split` nor `tapply` have anything to do with a possible duplication of values. – alexis_laz Apr 06 '14 at 00:22
  • finally it worked, thanks so much for your efforts, @alexis_laz – Jin Apr 06 '14 at 00:41

1 Answers1

3

Have a look at tapply:

df <- read.table(textConnection("
user         item
10000000     1
10000000     2
10000000     3
10000001     1
10000001     4"), header=TRUE)

B <- tapply(df$item, df$user, FUN=as.character)
B
# $`10000000`
# [1] "1" "2" "3"
#
# $`10000001`
# [1] "1" "4"

EDIT: I do not know the arules package, but here the solution proposed by @alexis_laz:

library("arules")
as(split(df$item, df$user), "transactions")
# transactions in sparse format with
#  2 transactions (rows) and
#  4 items (columns)
sgibb
  • 25,396
  • 3
  • 68
  • 74
  • your solution provides an array, not a list. Assume your output is B, brand_table <- as(B, "transactions") will complain (after you install library("a rules") ) – Jin Apr 05 '14 at 22:43
  • @Jin: You should have mentioned the *arules* package, your aim and your link to the previous question from the beginning. See my edit. – sgibb Apr 05 '14 at 23:12
  • you are not using B in the as command? why use split(df$item, df$user) here? – Jin Apr 05 '14 at 23:15
  • @Jin because it is a short example. You could use `B <- split(df$item, df$user); as(B, "transactions")` instead. (And my `B` answer before the edit was the answer to the original question (and to the title of the question).) – sgibb Apr 05 '14 at 23:16
  • I see your point. But are you using association rule correctly if we try to collect all items belonging to same user first? We can directly split? – Jin Apr 05 '14 at 23:21
  • @Jin I have no idea. As I already mentioned, I do not know what the *arules* package is doing neither what an object of a `transactions` class should do or should look like. I just provided an answer to your question. The input is technical correct but I don't know anything about the logical correctness of the input. – sgibb Apr 05 '14 at 23:25
  • B <- split(df$item, df$user); as(B, "transactions") failed because in some user's transactions, items are appeared more than once, (i.e, duplicated). I tried to use a loop and set to remove these but failed – Jin Apr 06 '14 at 00:08
  • @Jin: `as(lapply(split(df$item, df$user), unique), "transactions")` – sgibb Apr 06 '14 at 07:40