I am using Java8 with Apache OpenNLP. I have a service that extracts all the nouns from a paragraph. This works as expected on my localhost server. I also had this running on an OpenShift server with no problems. However, it does use a lot of memory. I need to deploy my application to AWS Elastic Beanstalk Tomcat Server.
One solution is I could probably upgrade from AWS Elastic Beanstalk Tomcat Server t1.micro to another instance type. But I am on a small budget, and want to avoid the extra fees if possible.
When I run the app, and it tries to do the word chunking, it gets the following error:
dispatch failed; nested exception is java.lang.OutOfMemoryError: Java heap space] with root cause java.lang.OutOfMemoryError: Java heap space at opennlp.tools.ml.model.AbstractModelReader.getParameters(AbstractModelReader.java:148) at opennlp.tools.ml.maxent.io.GISModelReader.constructModel(GISModelReader.java:75) at opennlp.tools.ml.model.GenericModelReader.constructModel(GenericModelReader.java:59) at opennlp.tools.ml.model.AbstractModelReader.getModel(AbstractModelReader.java:87) at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:35) at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:31) at opennlp.tools.util.model.BaseModel.finishLoadingArtifacts(BaseModel.java:328) at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:256) at opennlp.tools.util.model.BaseModel.<init>(BaseModel.java:179) at opennlp.tools.parser.ParserModel.<init>(ParserModel.java:180) at com.jobs.spring.service.lang.LanguageChunkerServiceImpl.init(LanguageChunkerServiceImpl.java:35) at com.jobs.spring.service.lang.LanguageChunkerServiceImpl.getNouns(LanguageChunkerServiceImpl.java:46)
Question
Is there a way to either:
Reduce the amount of memory used when extracting the nouns from a paragraph.
Use a different api other than Apache OpenNLP that won't use as much memory.
A way to configure AWS Elastic Beanstalk Tomcat Server to cope with the demands.
Code Sample:
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashSet;
import java.util.Set;
import org.springframework.stereotype.Component;
import org.springframework.transaction.annotation.Transactional;
import opennlp.tools.cmdline.parser.ParserTool;
import opennlp.tools.parser.Parse;
import opennlp.tools.parser.Parser;
import opennlp.tools.parser.ParserFactory;
import opennlp.tools.parser.ParserModel;
import opennlp.tools.util.InvalidFormatException;
@Component("languageChunkerService")
@Transactional
public class LanguageChunkerServiceImpl implements LanguageChunkerService {
private Set<String> nouns = null;
private InputStream modelInParse = null;
private ParserModel model = null;
private Parser parser = null;
public void init() throws InvalidFormatException, IOException {
ClassLoader classLoader = getClass().getClassLoader();
File file = new File(classLoader.getResource("en-parser-chunking.bin").getFile());
modelInParse = new FileInputStream(file.getAbsolutePath());
// load chunking model
model = new ParserModel(modelInParse); // line 35
// create parse tree
parser = ParserFactory.create(model);
}
@Override
public Set<String> getNouns(String sentenceToExtract) {
Set<String> extractedNouns = new HashSet<String>();
nouns = new HashSet<>();
try {
if (parser == null) {
init();
}
Parse topParses[] = ParserTool.parseLine(sentenceToExtract, parser, 1);
// call subroutine to extract noun phrases
for (Parse p : topParses) {
getNounPhrases(p);
}
// print noun phrases
for (String s : nouns) {
String word = s.replaceAll("[^a-zA-Z ]", "").toLowerCase();// .split("\\s+");
//System.out.println(word);
extractedNouns.add(word);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if (modelInParse != null) {
try {
modelInParse.close();
} catch (IOException e) {
}
}
}
return extractedNouns;
}
// recursively loop through tree, extracting noun phrases
private void getNounPhrases(Parse p) {
if (p.getType().equals("NN")) { // NP=noun phrase
// System.out.println(p.getCoveredText()+" "+p.getType());
nouns.add(p.getCoveredText());
}
for (Parse child : p.getChildren())
getNounPhrases(child);
}
}
UPDATE
Tomcat8 config: