0

I'm using javaml to train a classifier. Now instances in my data contain vectors in the format like this

1 0:5 1:9 24:2 ......

so when i read these from a file I'm using string.split. And then putting the values in the sparseinstance which then gets addd to the classifier.

However I'm getting a heap space out of memory error. I've read about string.split() causing memory leaks as such I've used new String to avoid memory leak. However I'm still facing the heap space problem

The code is as follows

////////////////////////////////////////

BufferedReader br = new BufferedReader(new FileReader("Repository\\IMDB Data\\Train.feat"));
Dataset data=new DefaultDataset();
String TrainLine;
int j=0;
while((TrainLine = br.readLine()) != null && j < 20000){
                //TrainLine.replaceAll(":", " ");
                String[] arr = TrainLine.split("\\D+");
                double[] nums = new double[arr.length];
                for (int i = 0; i < nums.length; i++) {
                    nums[i] = Double.parseDouble(new String(arr[i]));
                }
                //vector has one less element than arr 85527
                String label;
                if(nums[0] == 1){
                    label = "positive";
                }else{
                    label = "negative";
                }
                System.out.println(label);
                Instance instance = new SparseInstance(85527,label);
                int i;
                for(i=1;i<arr.length;i=i+2){
                    instance.put((int)nums[i],nums[i+1]);
                    //Strings have been converted to new strings to overcome memory leak
                }
                data.add(instance);


                    j++;
            }
            knn = new KNearestNeighbors(5);
            knn.buildClassifier(data);

            svm = new LibSVM();
            svm.buildClassifier(data);

////////////////////////////////////////

Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
    at java.util.TreeMap.put(Unknown Source)
    at java.util.TreeSet.add(Unknown Source)
    at java.util.AbstractCollection.addAll(Unknown Source)
    at java.util.TreeSet.addAll(Unknown Source)
    at net.sf.javaml.core.SparseInstance.keySet(SparseInstance.java:144)
    at net.sf.javaml.core.SparseInstance.keySet(SparseInstance.java:27)
    at libsvm.LibSVM.transformDataset(LibSVM.java:80)
    at libsvm.LibSVM.buildClassifier(LibSVM.java:127)
    at backend.ShubhamKNN.<init>(ShubhamKNN.java:55)
  • http://stackoverflow.com/questions/37335/how-to-deal-with-java-lang-outofmemoryerror-java-heap-space-error-64mb-heap – Pratik Apr 02 '15 at 06:06
  • there is something wrong with your DataSet object. it might be failing at ` knn.buildClassifier(data);` – Prashant Apr 02 '15 at 06:25
  • @Prashant dataset is fine and it's not failing at knn.buildClassifier(data) .. – Shubham Sharma Apr 02 '15 at 08:33
  • @ShubhamSharma : see at stacktrace `at libsvm.LibSVM.buildClassifier(LibSVM.java:127)` means error is in `buildClassifier` method which you are calling. – Prashant Apr 02 '15 at 08:46

1 Answers1

0

I also get this error, it happens when dataset is too big.

You can run your code with only 1000 records, i guess it runs ok. Cost much memory is a problem of Libsvm, it always occurs the error:

java.lang.OutOfMemoryError: Java heap space

if your computer has enough memory (mine is 8G), you can adjust the memory param of class in Eclipse:

  1. choose the class which calls libsvm lib in Package Explorer view

  2. at menu, Run -> Run configuration.. -> tab (x=)arguments - the input of VM arguments, type into -Xmx1024M. it means class could cost max memory is 1024M, I set the param 3072M, my class runs ok.

  3. rerun the class.

above is my solution, more detail see: http://blog.csdn.net/felomeng/article/details/4688414

Yaroslav Stavnichiy
  • 20,738
  • 6
  • 52
  • 55