16

I need to access a lucene index ( created by crawling several webpages using Nutch) but it is giving the error shown above :

java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/home/<path>: files:
    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:516)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:185)
    at org.apache.lucene.index.IndexReader.open(IndexReader.java:148)
    at DictionaryGenerator.generateDict(DictionaryGenerator.java:24)
    at DictionaryGenerator.main(DictionaryGenerator.java:56)

I googled but the reasons given were not matching the requirements. The fact that files are being shown ( the path) probably means that the directory is not empty.
Thanks

crazyaboutliv
  • 3,029
  • 9
  • 33
  • 50

3 Answers3

29

Another hint, as I was having the same error and found that after creating indexes I did not close IndexWriter and it proved very unforgiven. In my indexdirectory I have some .lock files and no segments or segments.gen files which is what Reader is looking for. See here #3 for details

nir
  • 3,743
  • 4
  • 39
  • 63
  • 1
    Just for those who are wondering (as I did): Even if you close your IndexWriter, the `write.lock` file will still exist in your folder. So don't care if this file doesn't get deleted. – Munchkin Jun 18 '15 at 13:46
  • @nir, this absolutely the problem for me! Thank you! – Zac Taylor Mar 28 '20 at 18:13
9

Basically, the error message says that Lucene did not find the proper files in the index directory. I suggest checking the following:

  1. Verify the path of the index directory fits what you think it should be.
  2. Do the Nutch and Lucene versions used match? This may stem from a version difference.
  3. Is there a permissions issue? Can you read the files in the directory?
  4. Try looking at the index using Luke. If you cannot, there is probably some corruption in the index.

If all these do not help, Please post the indexing part of the code.

Yuval F
  • 20,565
  • 5
  • 44
  • 69
  • I did all of them except the Nutch and Lucene versions.I was not aware that there has to be a compatibility between Lucene and Nutch . If it helps, the lucene version is 2.2 . I can access the files. Infact,i am running the java program in the same directory as the index . Also, i checked the index using Luke and its definitely fine . Also, the thing is that i just became a part of the project. The index is the result of an extensive crawl by Nutch . So , i do not have any indexing code. It was just a crawl .But i will still try to find out the exact picture. – crazyaboutliv Sep 27 '10 at 16:57
  • One thing i have observed is that the newer version of Nutch (1.1) generates 5 folders after a crawl while the data which i have has only 4( out of which segments is one) folders . Can that be an issue ? – crazyaboutliv Sep 27 '10 at 16:58
  • Like Yuval said, make sure that the Java program that you use to read the index uses the same version of Lucene that Nutch used to create the index. – Pascal Dimassimo Sep 28 '10 at 12:44
2

Stumbled upon this issue in 2020:

I had opened the IndexReader using the

org.apache.lucene.index.DirectoryReader#open(org.apache.lucene.store.Directory)

method instead of

org.apache.lucene.index.DirectoryReader#open(org.apache.lucene.index.IndexWriter)

The first one resulted in the error described above, while the latter one worked fine on an empty directory - and seems the way to go here.

Sebastian Schmitt
  • 433
  • 1
  • 5
  • 18