2

I have a data set of ~ 25kR x 10C. Several of the columns contain large numbers of levels in categorical variables. I'm trying to convert the data set to SVM-Light format using the RSofia package but am getting the following error

Error in model.matrix.default(formula, data) : 
  allocMatrix: too many elements specified

I have successfully converted the data set to a sparse.model.matrix object using the Matrix package but am curious if it's possible to write an SVM-Light formatted file from a sparse.model.matrix object.

My code is below:

library(RSofia)
library(Matrix)

n = 100000
df1 <- data.frame(id = 1:n, target = round(runif(n),0), col1 = factor(letters[sample(1:26,n,replace = T)])
        , col2 = factor(letters[sample(1:26,n,replace = T)])
        , col3 = round(runif(n)*1000,0)
        )
df1$col4 <- with(df1,factor(paste(col2, col3, sep = '')))
head(df1);length(unique(df1$col4))
str(df1)

varsToUse <- c('col1','col2','col3', 'col4')

smm1 <- sparse.model.matrix(df1$target ~ 0 +., data = df1[,varsToUse])

I get errors when I run this code:

x <- parse_formula(smm1$target ~ 0 +., data = smm1[,varsToUse])
x <- parse_formula(df1$target ~ 0 +., data = df1[,varsToUse])

tmp <- tempfile()
write.svmlight(x$labels, x$data, tmp);
readLines(tmp)

Any suggestions?

screechOwl
  • 27,310
  • 61
  • 158
  • 267
  • Did you ever find a way to do this? I am looking at the same problem. – B_Miner Jul 22 '13 at 17:26
  • Sadly, no. I think I have an idea of how to do it, but it will take a while to write the code. – screechOwl Jul 22 '13 at 19:53
  • I wrote a function to convert a data frame to svm-light format: http://stackoverflow.com/a/24143226/190791. Works for a binary classification. I am not sure if it helps here. – Timothée HENRY Jun 11 '14 at 08:56

0 Answers0