create list based on data frame in R

Question

I have a data frame A in the following format

user         item
10000000     1      # each user is a 8 digits integer, item is up to 5 digits integer
10000000     2
10000000     3
10000001     1
10000001     4
..............

What I want is a list B, with users' names as the name of list elements, list element is a vector of items corresponding to this user.

e.g

B = list(c(1,2,3),c(1,4),...)

I also need to paste names to B. To apply association rule learning, items need to be convert to characters

Originally I used tapply(A$user,A$item, c), this makes it not compatible with association rule package. See my post:

data format error in association rule learning R

But @sgibb's solution seems also generates an array, not a list.

library("arules")
temp <- as(C, "transactions")    # C is output using @sgibb's solution

throws error: Error in as(C, "transactions") : 
no method or default for coercing “array” to “transactions”

Please please please use `dput` to share your data. [See here for reasons and more details](http://stackoverflow.com/q/5963269/903061), it makes it much easier to help. — Gregor Thomas, Apr 05 '14 at 22:21
Also, in your previous question you mentioned `split`. See `split(A$item, A$user)` — alexis_laz, Apr 05 '14 at 22:27
@alexis_laz, you are right. It made me whole afternoon trying to dig out the bug — Jin, Apr 05 '14 at 22:36
@alexis_laz, would you provide your solution? I am really tired after whole afternoon trying. — Jin, Apr 05 '14 at 22:53
alexis_laz, are we applying association rule correctly if we directly use split function without "tapply", i.e, correct items together — Jin, Apr 05 '14 at 23:23
@Jin the output of `tapply` and `split` is the same. The only difference is `class(tapply(...)) == "array"` and `class(split(...)) == "list")`. — sgibb, Apr 05 '14 at 23:27
@alexis_laz, your solution split(A$item, A$user) will create duplicated items in a lot of users' itemList, how to remove these duplicates? I already tried loop combined with unique, but failed. Thanks — Jin, Apr 06 '14 at 00:07
Perhaps, try something like `lapply(split(A$item, A$user), unique)`. Should there be duplicated items, though? If not, maybe you 've made a miscalculation somewhere when building `A`? I only say this, because neither `split` nor `tapply` have anything to do with a possible duplication of values. — alexis_laz, Apr 06 '14 at 00:22
finally it worked, thanks so much for your efforts, @alexis_laz — Jin, Apr 06 '14 at 00:41

sgibb · Accepted Answer · 2014-04-05T23:09:51.367

3

Have a look at tapply:

df <- read.table(textConnection("
user         item
10000000     1
10000000     2
10000000     3
10000001     1
10000001     4"), header=TRUE)

B <- tapply(df$item, df$user, FUN=as.character)
B
# $`10000000`
# [1] "1" "2" "3"
#
# $`10000001`
# [1] "1" "4"

EDIT: I do not know the arules package, but here the solution proposed by @alexis_laz:

library("arules")
as(split(df$item, df$user), "transactions")
# transactions in sparse format with
#  2 transactions (rows) and
#  4 items (columns)

edited Apr 05 '14 at 23:09

answered Apr 05 '14 at 22:25

sgibb

25,396
3
68
74

your solution provides an array, not a list. Assume your output is B, brand_table <- as(B, "transactions") will complain (after you install library("a rules") ) – Jin Apr 05 '14 at 22:43
@Jin: You should have mentioned the *arules* package, your aim and your link to the previous question from the beginning. See my edit. – sgibb Apr 05 '14 at 23:12
you are not using B in the as command? why use split(df$item, df$user) here? – Jin Apr 05 '14 at 23:15
@Jin because it is a short example. You could use `B <- split(df$item, df$user); as(B, "transactions")` instead. (And my `B` answer before the edit was the answer to the original question (and to the title of the question).) – sgibb Apr 05 '14 at 23:16
I see your point. But are you using association rule correctly if we try to collect all items belonging to same user first? We can directly split? – Jin Apr 05 '14 at 23:21
@Jin I have no idea. As I already mentioned, I do not know what the *arules* package is doing neither what an object of a `transactions` class should do or should look like. I just provided an answer to your question. The input is technical correct but I don't know anything about the logical correctness of the input. – sgibb Apr 05 '14 at 23:25
B <- split(df$item, df$user); as(B, "transactions") failed because in some user's transactions, items are appeared more than once, (i.e, duplicated). I tried to use a loop and set to remove these but failed – Jin Apr 06 '14 at 00:08
@Jin: `as(lapply(split(df$item, df$user), unique), "transactions")` – sgibb Apr 06 '14 at 07:40

create list based on data frame in R

1 Answers1