3

I have a dataset of columns search_query[factor], movie_name[factor], clicks[int] with about 1,800,000 rows. When I use the dcast function from the reshape2 package to try to create a matrix from search queries and movie names with click as the value I get this error:

    train.matrix <- dcast(train, query ~ movie, value.var = "clicks")

    Aggregation function missing: defaulting to length
    Error in .Call("split_indices", index, group, as.integer(n)) : 
       negative length vectors are not allowed
    In addition: Warning message:
    In split_indices(seq_along(.value), .group, .n) :
      NAs introduced by coercion

If I subset to 100,000 rows the data then I can run dcast from the reshape2 package just fine.

    train.matrix <- dcast(train[1:100000,], query ~ movie, value.var = "clicks")

The number of values of movies is 69,598 and click values are all positive have no NAs. Running version 2.15.1 of R.

What could be the problem, is the data set too large? If so, how can I achieve the same outcome with this dataset?

Thanks so much in advance!

user1460878
  • 33
  • 1
  • 4
  • 3
    If I had a data set that large, I would not expect the "standard" tools in R to be of much use in many cases. [This](http://stackoverflow.com/questions/6902087/proper-fastest-way-to-reshape-a-data-table) question using data.table might be more fruitful. Beyond that I'd start investigating SQL data bases or your own C code. – joran Sep 01 '12 at 21:32
  • @joran Good tip, but interesting that the accepted answer in that link recommends `tapply`, which is more "standard" R than `reshape2` – Andrie Sep 01 '12 at 21:53
  • @Andrie Indeed, I read too quickly. I was thinking there was a way to leverage data.tables for some performance gains when reshaping long -> wide, but perhaps not. – joran Sep 01 '12 at 22:10
  • Have you tried using base R's `reshape()` function? – Josh O'Brien Sep 01 '12 at 23:15
  • Goodness, my comment is probably somewhat less relevant now that you've downsized your data by a factor of 1000. – joran Sep 01 '12 at 23:35

0 Answers0