1

For modeling with SVM in R, I have used kernlab package (ksvm method)with Windows Xp operating system and 2 GB RAM. But having more data rows as 201497, I can'nt able to provide more memory for processing of data modeling (getting issue : can not allocate vector size greater than 2.7 GB).

Therefore, I have used Amazon micro and large instance for SCM modeling. But, it have same issue as local machine (can not allocate vector size greater than 2.7 GB).

Can any one suggest me the solution of this problem with BIG DATA modeling or Is there something wrong with this.

Vignesh
  • 2,247
  • 2
  • 14
  • 12
  • How large is your data? If the system is trying to allocate 2.7GB for your data, or an enhanced structure associated with your data, and you only have 2GB of physical RAM, this will not work out well. – image_doctor Dec 09 '12 at 13:05
  • 1
    This may also be useful http://www.numbertheory.nl/2012/08/15/predicting-the-memory-usage-of-an-r-object-containing-numbers/ – image_doctor Dec 09 '12 at 13:33

1 Answers1

4

Without a reproducible example it is hard to say if the dataset is just too big, or if some parts of your script are suboptimal. A few general pointers:

  • Take a look at the High Performance Computing Taskview, this lists the main R packages relevant for working with BigData.
  • You use your entire dataset for training your model. You could try to take a subset (say 10%) and fit your model on that. Repeating this procedure a few times will yield insight into if the model fit is sensitive to which subset of the data you use.
  • Some analysis techniques, e.g. PCA analysis, can be done by processing the data iteratively, i.e. in chunks. This makes analyses possible on very big datasets possible (>> 100 gb). I'm not sure if this is possible with kernlab.
  • Check if the R version you are using is 64 bit.
  • This earlier question might be of interest.
Community
  • 1
  • 1
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149