I'm using javaml to train a classifier. Now instances in my data contain vectors in the format like this
1 0:5 1:9 24:2 ......
so when i read these from a file I'm using string.split. And then putting the values in the sparseinstance which then gets addd to the classifier.
However I'm getting a heap space out of memory error. I've read about string.split() causing memory leaks as such I've used new String to avoid memory leak. However I'm still facing the heap space problem
The code is as follows
////////////////////////////////////////
BufferedReader br = new BufferedReader(new FileReader("Repository\\IMDB Data\\Train.feat"));
Dataset data=new DefaultDataset();
String TrainLine;
int j=0;
while((TrainLine = br.readLine()) != null && j < 20000){
//TrainLine.replaceAll(":", " ");
String[] arr = TrainLine.split("\\D+");
double[] nums = new double[arr.length];
for (int i = 0; i < nums.length; i++) {
nums[i] = Double.parseDouble(new String(arr[i]));
}
//vector has one less element than arr 85527
String label;
if(nums[0] == 1){
label = "positive";
}else{
label = "negative";
}
System.out.println(label);
Instance instance = new SparseInstance(85527,label);
int i;
for(i=1;i<arr.length;i=i+2){
instance.put((int)nums[i],nums[i+1]);
//Strings have been converted to new strings to overcome memory leak
}
data.add(instance);
j++;
}
knn = new KNearestNeighbors(5);
knn.buildClassifier(data);
svm = new LibSVM();
svm.buildClassifier(data);
////////////////////////////////////////
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
at java.util.TreeMap.put(Unknown Source)
at java.util.TreeSet.add(Unknown Source)
at java.util.AbstractCollection.addAll(Unknown Source)
at java.util.TreeSet.addAll(Unknown Source)
at net.sf.javaml.core.SparseInstance.keySet(SparseInstance.java:144)
at net.sf.javaml.core.SparseInstance.keySet(SparseInstance.java:27)
at libsvm.LibSVM.transformDataset(LibSVM.java:80)
at libsvm.LibSVM.buildClassifier(LibSVM.java:127)
at backend.ShubhamKNN.<init>(ShubhamKNN.java:55)