1

Open Search Server is crashing while crawling files. OSS is running as a daemon on an Ubuntu box. This is a production server with 64gb ram and 12 cores, crawling files on an extremely fast nas that it mounts, about 20 gb of files. 2gb memory allotted for OSS. The largest file that should get crawled is about 1.3gb. There are 5 mp4 files that are all over 1gb.

Usually at some point during the crawl process, OSS will become completely unresponsive. Restarting OSS fixes the problem. Today I monitored a crawl, which usually uses one or two cores at a time. When it crashed it was maxing out all 12 cores. Total memory usage on the server was fine, but I'm not sure how much OSS was using.

We've looked at the oss log files and there's not a single error that happens before each crash, but there are two errors that are pretty common in the logs:

WARN: org.apache.cxf.jaxrs.utils.JAXRSUtils - Both com.jaeksoft.searchlib.webservice.crawler.database.DatabaseImpl#run and com.jaeksoft.searchlib.webservice.crawler.database.DatabaseImpl#run are equal candidates for handling the current request which can lead to unpredictable results

WARN: root - Low memory free conditions: flushing crawl buffer

We have one index that handles all files. It is based on the file crawler template—the only changes are:

  1. An extra analyzer that uses 4 regex replaces.
  2. An extra field that copies the url field and uses the analyzer from
  3. We added one disk location, which has all the files.
  4. We join another index in our query.

When we are able to crawl, querying the index works fine afterwards. I think maybe the crashes only happen if there's a search query on the index during the crawl, but haven't been able to confirm that yet.

ktamlyn
  • 4,519
  • 2
  • 30
  • 41
J. Byers
  • 33
  • 4
  • 1
    The reason for the crash could be an OutOfMemoryError. I suggest starting the JVM with JMX and heap dump on OutOfMemoryError enabled. So you can monitor memory consumption and get a dump on OOM crashes. – aventurin Jul 30 '16 at 21:01
  • Find out if the MP4 files are the cause by excluding them from the crawl. – ktamlyn Jul 31 '16 at 02:55

0 Answers0