1

I am relatively new to Java world and started using Solr recently. I am running Solr 5.2.1 on Amazon t2.small box which is single core and 2 gm RAM ubuntu server. I ran Solr with 1gb heap space configuration. The Solr core currently has 8M documents with 15 fields, 14 of which are String ids only. The other being a DateRange Field type.

The search queries are typically long ones typically on range of 15000-20000 characters. This is due to filter queries being used with multiple field values on the range of 100s. For example,

/select?fq=field1:("value-1"+OR+"value-2"+.......+OR+"value-n") , n ranges from 1000-2000

I modified the Jetty's MaxURLLength to 65535 which allowed me to do this.

Earlier, when the number of documents were < 2M, Solr was running smoothly. But, when the number of documents reached 8M, Solr starts crashing giving OutOfMemoryError Heap Space Error. The following is the exception

java.lang.OutOfMemoryError: Java heap space
    at org.apache.lucene.util.FixedBitSet.<init>(FixedBitSet.java:115)
    at org.apache.lucene.spatial.prefix.IntersectsPrefixTreeFilter$1.start(IntersectsPrefixTreeFilter.java:62)
    at org.apache.lucene.spatial.prefix.AbstractVisitingPrefixTreeFilter$VisitorTemplate.getDocIdSet(AbstractVisitingPrefixTreeFilter.java:130)
    at org.apache.lucene.spatial.prefix.IntersectsPrefixTreeFilter.getDocIdSet(IntersectsPrefixTreeFilter.java:57)
    at org.apache.lucene.search.Filter$1.scorer(Filter.java:95)
    at org.apache.lucene.search.Weight.bulkScorer(Weight.java:137)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:768)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:485)
    at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1243)
    at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:926)
    at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1088)
    at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1609)
    at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1485)
    at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:561)
    at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:518)
    at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
    at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
  1. Is the above exception due to lack of memory?
  2. Is it due to query being too long which inturn is affecting the search?
sravan_kumar
  • 1,129
  • 1
  • 13
  • 25
  • 1. Yes. If you read the word "OutOfMemoryError", you can see how that's kinda obvious. 2. Probably, but you'll know for sure if you profile your application. – Kayaman Oct 15 '15 at 13:08
  • you mean profile Solr? – sravan_kumar Oct 15 '15 at 13:09
  • What is the -Xmx setting for your JVM? Have you tried to increase it? – rudolfv Oct 15 '15 at 13:09
  • @rudolfv -Xmx was set to 1gb – sravan_kumar Oct 15 '15 at 13:10
  • @sravan_kumar No, I mean profile your application. You're not developing Solr (I hope). – Kayaman Oct 15 '15 at 13:10
  • Possible duplicate of [How to deal with "java.lang.OutOfMemoryError: Java heap space" error (64MB heap size)](http://stackoverflow.com/questions/37335/how-to-deal-with-java-lang-outofmemoryerror-java-heap-space-error-64mb-heap) –  Oct 15 '15 at 13:10
  • @Kayaman oh ok... currently I am migrating data to 4gb instance and give 2gb heap space to Solr. Hope this may work – sravan_kumar Oct 15 '15 at 13:13

2 Answers2

3

This is probably due to the number of filters: Each filter uses 1 bit per document in your index. With 8M documents, each filter uses a 1MB.

If the filterCache section in your solrconfig.xml is from the example, its size is 512. This means that it will, over time, come to contain 512*1MB data for your index. With a 1GB heap, it sounds reasonable that it will run out of memory.

The easy solution is to lower the amount of entries in the filter cache. That might negatively impact your search speed or it might not influence it at all, if your filters are unique between calls. You will have to test that.

See https://wiki.apache.org/solr/SolrCaching#filterCache

Toke Eskildsen
  • 709
  • 4
  • 10
  • Thanks for the reply. Now I am aware of different types of caches in Solr. Will run set of experiments with different heap sizes and cache sizes and check the behaviour. – sravan_kumar Oct 16 '15 at 11:07
  • hey lowering filter cache worked !! Will do the math for the required memory and move to a larger machine for more memory. Thank you very much!! – sravan_kumar Oct 16 '15 at 12:56
0

If you're filtering on your date field, then using a date range filter (in place of a Boolean OR with 100s of values) will save Solr from (the I/O, CPU and memory overhead of) scanning your collection 100s of times per query.

Solr's TrieDateField type is indexed in a way (using a Trie) such that finding documents with date values within a range is a cheap operation (vs iterating the entire collection).

If you're instead querying for documents with dates "at the same time of day" over the past 1000-2000 days, then consider encoding the time-of-day separately in its own field (as an int perhaps to save space?) so you can focus your filter first on time-of-day before eliminating documents > 2000 days old.

Peter Dixon-Moses
  • 3,169
  • 14
  • 18
  • Hey I am already using DateRange Field for my index. Apart from this filter I also need 4 other string filters one of which is a boolean filters having hundreds of values. – sravan_kumar Oct 19 '15 at 06:11
  • The Boolean filter with 100s of values is still likely to be the most expensive piece of the processing. Read this helpful piece re: the `cost` parameter for uncached filters (and PostFilters). https://lucidworks.com/blog/2012/02/10/advanced-filter-caching-in-solr/. You may discover that by first reducing the search-space using the *cheap to execute* filters (date range and string match), then the more-expensive boolean filter (w 100s of values) will be checked less-frequently, thereby lowering the total cost of your query. – Peter Dixon-Moses Oct 20 '15 at 03:03