I have a dataset of columns search_query[factor], movie_name[factor], clicks[int] with about 1,800,000 rows. When I use the dcast function from the reshape2 package to try to create a matrix from search queries and movie names with click as the value I get this error:
train.matrix <- dcast(train, query ~ movie, value.var = "clicks")
Aggregation function missing: defaulting to length
Error in .Call("split_indices", index, group, as.integer(n)) :
negative length vectors are not allowed
In addition: Warning message:
In split_indices(seq_along(.value), .group, .n) :
NAs introduced by coercion
If I subset to 100,000 rows the data then I can run dcast from the reshape2 package just fine.
train.matrix <- dcast(train[1:100000,], query ~ movie, value.var = "clicks")
The number of values of movies is 69,598 and click values are all positive have no NAs. Running version 2.15.1 of R.
What could be the problem, is the data set too large? If so, how can I achieve the same outcome with this dataset?
Thanks so much in advance!