Apache SOLR 3.5 hangs when indexing

Question

I'm trying to index a drupal site with something around 1.5 million nodes. Mostly simple nodes, about 100k nodes are larger in size (pdf-documents processed with tika).

I've tried indexing several times now and it always fails in the same way: SOLR crashes/hangs with high load and mem usage after several days of indexing (not looking for maximum throughput per se). First I've moved the install to a bigger box, from 2 cpu/2GB mem to 8 core 16GB memory. This fixed the problem for a little while but now the situation is almost identical. I'm able to index about 500k nodes.

Java is using way more memory than the heap size (currently at 8000M) (a lot of swapping) Load is around 3.0 (for the small and big box) Solr is not responding for indexing. Searching is slow but possible. Admin interface is responsive

Restarting SOLR fixes the problem for a little while but it always comes back.

When querying the index size during a crash I notice the directory size is fluctuating a lot. After starting SOLR the directory is around 6,5 and works it's way up to 13GB before falling to 6.5 GB again.. This keeps repeating.

I've added the instructions for logging out of memory errors but this doesn't provide me with any logs.

I'm using the standard SOLR configuration for drupal 6. I've used different mergefactors but this doesn't seem to do anything to help the problem.

Anyone with ideas? If you need more information I'll try to respond as quick as possible!

This is in my log at the moment: Exception in thread "Lucene Merge Thread #0" org.apache.lucene.index.MergePolicy$MergeException: java.io.FileNotFoundException: /usr/local/solr35/example/multicore/mydivp/data/index/_1bm.fnm (No such file or directory) at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:517) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) Caused by: java.io.FileNotFoundException: /usr/local/solr35/example/multicore/mydivp/data/index/_1bm.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:233) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:214) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:345) at org.apache.lucene.index.FieldInfos.(FieldInfos.java:74) at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:73) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:705) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4400) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3940) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456) 2012-04-03 14:26:25.409:INFO::Shutdown hook complete

Kind regards, Bram Rongen

Update 2012-04-06

It still isn't working.. Inspecting my data/index/ directory reveals Solr keeps rebuilding/merging.. One segment gets built and once that is done the previous gets deleted and Solr starts again, even when no new documents are added. Another weird thing is the .fdt file doesn't grow even though Solr status indicates around 300k more documents are indexed. The largest .fdt file in the directory is never bigger then 4.9GB.

Any thoughts?

The variation in disk space usage is normal. Solr does automatic merging of the index segments when they get too big. Out of memory errors should already be logged to the main servlet container log, catalina.out for Tomcat or jetty.log for Jetty. What version of Java? — Walter Underwood, Apr 03 '12 at 16:53
You are mis-understanding how Java untilizes memory, [the heap isn't what the JVM actually uses, it is much more complicated than that](http://stackoverflow.com/a/9146775/177800). — , Apr 03 '12 at 18:28
I'm running ubuntu 10.04 with the latest java: java version "1.6.0_20" OpenJDK Runtime Environment (IcedTea6 1.9.13) (6b20-1.9.13-0ubuntu1~10.04.1) OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode) Before I was running on CentOS.. I might be misunderstanding the way Java utilizes memory but at the moment it doesn't matter what value I assign to -XmX, the JVM is eating all physical memory and swap killing performance ;) — Bram Rongen, Apr 04 '12 at 13:56

score 1 · Answer 1 · answered Apr 05 '12 at 19:12

1

He guys,

I've changed the MergePolicy to LogByteSizeMergePolicy and MergeScheduler to ConcurrentMergeScheduler which seems to solve te problem. Still not totally sure what happened but we're back up and running ;)

Thanks!

answered Apr 05 '12 at 19:12

Bram Rongen

21
1
5

Nick Veenhof · Answer 2 · 2012-04-06T10:10:21.633

This blog might help in understanding the performance factors (the blog is more focussed on queries) and the merge policies

http://www.nickveenhof.be/blog/upgrading-apache-solr-14-35-and-its-implications

Also, is your Solr and Drupal on the same server?

Extra info, it is recommended that you set luceneMatchVersion to latest Lucene_35 when you use logbytemerge or the defaults. The new version of lucene should have memory leak fixes also:

<?xml version="1.0" encoding="UTF-8" ?>
<config name="my_config">
  <!-- Controls what version of Lucene various components of Solr
       adhere to.  Generally, you want to use the latest version to
       get all bug fixes and improvements. It is highly recommended
       that you fully re-index after changing this setting as it can
       affect both how text is indexed and queried.
    -->
  <luceneMatchVersion>LUCENE_35</luceneMatchVersion>
  <abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
  <indexDefaults>
    <useCompoundFile>false</useCompoundFile>
    <mergeFactor>10</mergeFactor>
    <!-- Tell Lucene when to flush documents to disk.
    Giving Lucene more memory for indexing means faster indexing at the cost of more RAM
    If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first.
    -->
    <ramBufferSizeMB>32</ramBufferSizeMB>
    <maxMergeDocs>2147483647</maxMergeDocs>
    <maxFieldLength>20000</maxFieldLength>
    <writeLockTimeout>1000</writeLockTimeout>
    <commitLockTimeout>10000</commitLockTimeout>
    <!--
     Expert:
     The Merge Policy in Lucene controls how merging is handled by Lucene.  The default in 2.3 is the LogByteSizeMergePolicy, previous
     versions used LogDocMergePolicy.

     LogByteSizeMergePolicy chooses segments to merge based on their size.  The Lucene 2.2 default, LogDocMergePolicy chose when
     to merge based on number of documents

     Other implementations of MergePolicy must have a no-argument constructor
     -->
    <mergePolicy>org.apache.lucene.index.LogByteSizeMergePolicy</mergePolicy>
...

Hi Nick, thank you for answering! Solr and Drupal are running on different servers. I suspect it has something to do with merge policies, but I don't know what.. I've restarted SOLR which meant It ran for another 20 hours.. Right now it's creating new .ftd's and deleting older ones.. — Bram Rongen, Apr 04 '12 at 14:10
Hi, actually I've already added LUCENE_35 to the configuration, doesn't help :( — Bram Rongen, Apr 06 '12 at 10:17
Alright, tried different mergepolicys but every time my biggest .fdt file reaches 4.9GB Solr just crashes :( reached this limit several times now.. Any ideas? — Bram Rongen, Apr 12 '12 at 19:20

Apache SOLR 3.5 hangs when indexing

Update 2012-04-06

2 Answers2