0

Suppose I have a table arranged as follows:

Movie ID     User ID
12             123
13             421
17             908
.               .
.               .
.               .

and I want to arrange the table as follows;

User ID|Movie ID:12  13   17 . . . 671
123                1   1   1         0
421                1   1   1         0
908                1   1   1         0

Is there an easy way I can do this in R or do i have to write custom code? I want the values to be either 1/0 depending if the user watched the movie.

phil12
  • 135
  • 7
  • 3
    I guess you can find this useful: http://stackoverflow.com/questions/33457501/transforming-dataset-into-value-matrix/33457722#33457722 – nicola Nov 27 '15 at 17:02
  • 1
    Just to be clearer, you have to build a `sparseMatrix`. Follow the linked question and just set the `x` argument in `sparseMatrix` as 1. – nicola Nov 27 '15 at 17:11
  • I am still having an issue Warning messages: 1: In asMethod(object) : Reached total allocation of 16343Mb: see help(memory.size) – phil12 Nov 27 '15 at 17:30
  • @nicola Actually it worked. Is there a way I can see the actual user and movie IDs? I see only the row numbers and column numbers. Is there an option in this function where I can specify this? – phil12 Nov 27 '15 at 18:32

1 Answers1

2

We can use table

as.data.frame.matrix(table(df1))
#     12 13 17
#123  1  0  0
#421  0  1  0
#908  0  0  1

Or another option is dcast from data.table

library(data.table)#v1.9.6+
dcast(setDT(df1), UserID~MovieID, value.var='MovieID', length)
#   UserID 12 13 17
#1    123  1  0  0
#2    421  0  1  0
#3    908  0  0  1

data

df1 <- structure(list(MovieID = c(12L, 13L, 17L), UserID = c(123L, 421L,  
908L)), .Names = c("MovieID", "UserID"), class = "data.frame", 
row.names = c(NA, -3L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I get an error: Error: cannot allocate vector of size 1.2 Gb with the dcast code. Is there a size limit? – phil12 Nov 27 '15 at 17:02
  • @phil12 Can you try the second option i.e. the `data.table` on a fresh R session? If the file is really big, you may need to do this on a system with more memory. – akrun Nov 27 '15 at 17:04
  • @akrunI did try the second option. I have 16 GB of RAM. Is there an option in R that I need to specify to use more memory? – phil12 Nov 27 '15 at 17:06
  • @phil12 How big is the file? – akrun Nov 27 '15 at 17:06
  • It is around 5 GB but I guess transposing it so that it includes a lot of columns will increase the size significantly – phil12 Nov 27 '15 at 17:07
  • @phil12 If you have already loaded other objects, it may mess up the memory. Try it on a fresh session. – akrun Nov 27 '15 at 17:08
  • It's not a matter of fresh session. The obtained matrix could be huge, several order of magnitude bigger than the original `data.frame`. The `sparseMatrix` is the way to go in this instance. – nicola Nov 27 '15 at 17:09
  • @nicola You are correct. He resulting matrix would have 16,000 columns compared to original(2 columns) – phil12 Nov 27 '15 at 17:12