Use WEKA API to perform LSA on train and test set

Question

I need to use Weka and its AttributeSelection algorithm LatentSemanticAnalysis to do text classification. I have my dataset split into training and test sets on which I want to apply LSA. I have read some posts regarding LSA, however I have not found how I can use it on to seperate datasets and keep them compatible. This is what I have so far but runs out of memory...:

AttributeSelection selecter = new AttributeSelection();
weka.attributeSelection.LatentSemanticAnalysis lsa = new weka.attributeSelection.LatentSemanticAnalysis();
Ranker rank = new Ranker();

selecter.setEvaluator(lsa);
selecter.setSearch(rank);
selecter.setRanking(true);

selecter.SelectAttributes(input);
Instances outputData = selecter.reduceDimensionality(input);

Edit1 In responce to @Jose's reply I added a new version of my source code. This leads to an OutOfMemoryError:

AttributeSelection filter = new AttributeSelection(); // package weka.filters.supervised.attribute!
LatentSemanticAnalysis lsa = new LatentSemanticAnalysis();
Ranker rank = new Ranker();
filter.setEvaluator(lsa);
filter.setSearch(rank);
filter.setInputFormat(train);

train = Filter.useFilter(train, filter);
test = Filter.useFilter(test, filter);

Edit2 The error I am getting:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at weka.core.matrix.Matrix.getArrayCopy(Matrix.java:301)
at weka.core.matrix.SingularValueDecomposition.<init>(SingularValueDecomposition.java:76)
at weka.core.matrix.Matrix.svd(Matrix.java:913)
at weka.attributeSelection.LatentSemanticAnalysis.buildAttributeConstructor(LatentSemanticAnalysis.java:511)
at weka.attributeSelection.LatentSemanticAnalysis.buildEvaluator(LatentSemanticAnalysis.java:416)
at weka.attributeSelection.AttributeSelection.SelectAttributes(AttributeSelection.java:596)
at weka.filters.supervised.attribute.AttributeSelection.batchFinished(AttributeSelection.java:455)
at weka.filters.Filter.useFilter(Filter.java:682)
at test.main(test.java:44)

Have you tried increasing the heap memory allocated to your java program through the [-Xmx](http://stackoverflow.com/questions/1565388/increase-heap-size-in-java) command line argument to the java executable? — Steven Magana-Zook, Apr 07 '14 at 22:50
@StevenMagana-Zook I cranked it up to 4096MB and still OutOfMemoryError — RazorAlliance192, Apr 07 '14 at 22:56
How big is your data set, and how much RAM is on your computer? From your stack trace it seems that copying the svd matrix pushes you over your limit. — Steven Magana-Zook, Apr 08 '14 at 15:18
@StevenMagana-Zook I have 118 class attributes and 25,765 attributes used in 9,603 instances. This is for the trainset, for the test set I have same number of class and normal attributes but here I have 3,299 instances. If it would help resolve my issue, I am using the Reuters21578 (ModApte split) dataset. In total I have 8GB of RAM on my 2011 MBP intel core 2duo — RazorAlliance192, Apr 08 '14 at 16:05
my eclipse is taking default -xmx only even though I changed the value in the argument. can you tell whats the problem? — Jana, Dec 13 '16 at 06:45

score 2 · Answer 1 · answered Apr 07 '14 at 20:58

2

As AttributeSelection is a filter, you can apply it in batch mode (-b option) to a training & a test subset at once, thus representing the test dataset according to the dimensions defined in the training set.

You can check how to do this in a program at Use Weka in your Java code - Filter - Batch filtering.

answered Apr 07 '14 at 20:58

Jose Maria Gomez Hidalgo

1,061
6
5

1

Thank you, I have something running however I end up with an OutOfMemoryError. I have added piece of code which is used for the transformation with LSA. – RazorAlliance192 Apr 07 '14 at 22:39
1

Is there any possibility you could help me out sir? I would appreciate it so much! – RazorAlliance192 Apr 08 '14 at 19:56
1

hello RazorAlliance192 , did u get any solution for this utOfMemoryError: Java heap space ? – Jana Dec 13 '16 at 06:40

Use WEKA API to perform LSA on train and test set

1 Answers1