I have a very large database (approximately 30 million records, each with at least 26 fields) which I have indexed with Apache Lucene Java.
I am constructing a query from two fields. Each search term could appear in any one of nine fields, and I want my query to return a Document if both of the search terms appear in any of the relevant fields in the Document. The query is structured like so:
Private Query CreateQuery(String theSearchTerm, String theField) throws ParseException
{
StandardAnalyzer theAnalyzer = new StandardAnalyzer(Version.LUCENE_35);
Query q;
QueryParser qp = new QueryParser(Version.LUCENE_35, theField, theAnalyzer);
qp.setDefaultOperator(QueryParser.Operator.AND);
qp.setAllowLeadingWildcard = true;
q = qp.parse(theSearchTerm);
return q;
}
Public ScoreDoc[] RunTheQuery(String searchTerm1, String searchTerm2)
{
Directory theIndex = new SimpleFSDirectory(new File("C:\\MyDirectory");
IndexSearcher theSearcher = new IndexSearcher(InderReader.open(theIndex));
BooleanQuery theTopLevelBooleanQuery = new BooleanQuery();
BooleanQuery fields1 = new BooleanQuery();
BooleanQuery fields2 = new BooleanQuery();
BooleanQuery fields3 = new BooleanQuery();
BooleanQuery fields4 = new BooleanQuery();
BooleanQuery fields5 = new BooleanQuery();
BooleanQuery fields6 = new BooleanQuery();
BooleanQuery fields7 = new BooleanQuery();
BooleanQuery fields8 = new BooleanQuery();
BooleanQuery fields9 = new BooleanQuery();
BooleanQuery innerQuery = new BooleanQuery();
fields1.add(CreateQuery(searchTerm1, param1), BooleanClause.Occur.MUST);
fields1.add(CreateQuery(searchTerm2, param2), BooleanClause.Occur.MUST);
fields2.add(CreateQuery(searchTerm1, param3), BooleanClause.Occur.MUST);
fields2.add(CreateQuery(searchTerm2, param4), BooleanClause.Occur.MUST);
fields3.add(CreateQuery(searchTerm1, param5), BooleanClause.Occur.MUST);
fields3.add(CreateQuery(searchTerm2, param6), BooleanClause.Occur.MUST);
fields4.add(CreateQuery(searchTerm1, param7), BooleanClause.Occur.MUST);
fields4.add(CreateQuery(searchTerm2, param8), BooleanClause.Occur.MUST);
fields5.add(CreateQuery(searchTerm1, param9), BooleanClause.Occur.MUST);
fields5.add(CreateQuery(searchTerm2, param10), BooleanClause.Occur.MUST);
fields6.add(CreateQuery(searchTerm1, param11), BooleanClause.Occur.MUST);
fields6.add(CreateQuery(searchTerm2, param12), BooleanClause.Occur.MUST);
fields7.add(CreateQuery(searchTerm1, param13), BooleanClause.Occur.MUST);
fields7.add(CreateQuery(searchTerm2, param14), BooleanClause.Occur.MUST);
fields8.add(CreateQuery(searchTerm1, param15), BooleanClause.Occur.MUST);
fields8.add(CreateQuery(searchTerm2, param16), BooleanClause.Occur.MUST);
fields9.add(CreateQuery(searchTerm1, param17), BooleanClause.Occur.MUST);
fields9.add(CreateQuery(searchTerm2, param18), BooleanClause.Occur.MUST);
innerQuery.add(fields1, BooleanClause.Occur.SHOULD);
innerQuery.add(fields2, BooleanClause.Occur.SHOULD);
innerQuery.add(fields3, BooleanClause.Occur.SHOULD);
innerQuery.add(fields4, BooleanClause.Occur.SHOULD);
innerQuery.add(fields5, BooleanClause.Occur.SHOULD);
innerQuery.add(fields6, BooleanClause.Occur.SHOULD);
innerQuery.add(fields7, BooleanClause.Occur.SHOULD);
innerQuery.add(fields8, BooleanClause.Occur.SHOULD);
innerQuery.add(fields9, BooleanClause.Occur.SHOULD);
theTopLevelBooleanQuery.add(innerQuery, BooleanClause.Occur.MUST);
TopDocScoreCollector collector = TopDocScoreCollector.create(200, true);
//Heap space error occurs here
theSearcher.search(theTopLevelBooleanQuery, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
return hits;
}
My problem is that when I call the IndexSearcher.search() method, the java.exe process on the server (Windows Server 2003 R2) consumes more than 540 MB, which causes a java heap space error. For completeness, the java app is running on a web server (currently Oracle Glassfish, although I'm looking to move to Apache Tomcat).
Does anyone have an idea for how to stop this heap space error? A StackOverflow post (http://stackoverflow.com/questions/7259736/cant-open-lucene-index-java-heap-space) seems to address a similar problem, but doesn't really give a detailed answer.
Is the only answer to increase the amount of memory that the Java process can use? Is the only answer to write a new searcher, in which case can anyone recommend a good article about light weight searchers?
Is there a way of solving this issue by modifying the above code?
Any help would be gratefully received, Thanks, Rik