2

I have a quite big data frame in R with two columns. I am trying to make out of the Code column (factor type with 858 levels) the dummy variables. The problem is that the R Studio always crashed when I am trying to do that.

> str(d)
'data.frame':   649226 obs. of  2 variables:
 $ User: int  210 210 210 210 269 317 317 317 317 326 ...
 $ Code      : Factor w/ 858 levels "AA02","AA03",..: 164 494 538 626 464 496 435 464 475 163 ... 

The User column is not unique, meaning that there can be several rows with the same User. Doesn't matter if in the end the amount of rows remains the same or the rows with the same User are merged into one row having several columns non-empty with the count of Codes.

I found couple of solutions that work for a smaller dataset, but not for mine.

Would be great if you can recommend me some method which is fast and working for such type of data.

Thanks!

Community
  • 1
  • 1
Kapuha
  • 93
  • 1
  • 12
  • If I am understanding you correctly, you want to take a data frame with two columns and add `858 - 1` columns of dummy variables for the factor levels in `Code`? ... why? – rawr Mar 09 '14 at 18:50
  • 1
    What do you mean by "R Studio crashed"? Is there an error message? Do you have sufficient RAM for a 649226*859 data.frame? – Roland Mar 09 '14 at 18:57
  • @rawr, Yes, you are correct. I wanted to run methods (e.g. `Recommender`) from the `recommenderlab` package. They are describing that you have to have this kind of matrix of user-item purchases to make predictions what users can buy in future. Page 21, [recommenderlab package] (http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf) – Kapuha Mar 09 '14 at 18:58
  • look at the command `disjunctive` from the library `sampling` – Davide Passaretti Mar 09 '14 at 19:04

1 Answers1

2

This worked for me perfectly:

library(reshape2)
m <- acast(data = d, User ~ Code)

The only thing was that it produced NAs, instead of 0s, but this can be easily changed with this:

m[is.na(m)] <- 0
Kapuha
  • 93
  • 1
  • 12