1

Is closing Lucene IndexWriter after each document addition slow down my indexing process?

I imagine, closing and opening index writer will slow down my indexing process or is it not true for Lucene?

Basically, I have a Lucene Indexer Step in a Spring Batch Job and I am creating indices in ItemProcessor. Indexer Step is a partitioned step and I create IndexWriter when ItemProcessor is created and keep it open till step completion.

@Bean
    @StepScope
    public ItemProcessor<InputVO,OutputVO> luceneIndexProcessor(@Value("#{stepExecutionContext[field1]}") String str) throws Exception{
        boolean exists = IndexUtils.checkIndexDir(str);
        String indexDir = IndexUtils.createAndGetIndexPath(str, exists);
        IndexWriterUtils indexWriterUtils = new IndexWriterUtils(indexDir, exists);
        IndexWriter indexWriter = indexWriterUtils.createIndexWriter();
        return new LuceneIndexProcessor(indexWriter);
    }

Is there a way to close this IndexWriter after step completion?

Also, I was encountering issues because I do search also in this step to find duplicate documents but I fixed that by adding writer.commit(); before opening reader and searching.

Please suggest if I need to close and open after each document addition or can keep it open all along? and also how to close in StepExecutionListenerSupport's afterStep?

Initially, I was closing and reopening for each document but indexing process was very slow so I thought it might be the reason.

Sabir Khan
  • 9,826
  • 7
  • 45
  • 98
  • You should *definitely* keep a single `IndexWriter` open for the entire indexing process. Opening a new one for each document would be expected to slow it down a great deal, as you've already seen. – femtoRgon Sep 26 '16 at 15:14

1 Answers1

0

Since in development, index directory is of small size so we may not see much gain but for large index directory sizes, we need not to do unnecessary creation and closing for IndexWriter as well as IndexReader.

In Spring Batch, I accomplished it with these steps

1.As pointed in my other question, first we need to address problem of serialization to put object in ExecutionContext.

2.We create and put instance of composite serializable object in ExecutionContext in partitioner.

3.Pass value from ExecutionContext to your step reader, processor or writer in configuration,

    @Bean
    @StepScope
    public ItemProcessor<InputVO,OutputVO> luceneIndexProcessor(@Value("#{stepExecutionContext[field1]}") String field1,@Value("#{stepExecutionContext[luceneObjects]}") SerializableLuceneObjects luceneObjects) throws Exception{
        LuceneIndexProcessor indexProcessor =new LuceneIndexProcessor(luceneObjects);
        return indexProcessor;
    }

4.Use this instance passed in processor wherever you need and use getter method to get index reader or writer,public IndexWriter getLuceneIndexWriter() {return luceneIndexWriter;}

5.Finally in StepExecutionListenerSupport 's afterStep(StepExecution stepExecution) close this writer or reader by getting it from ExecutionContext.

ExecutionContext executionContext = stepExecution.getExecutionContext();
SerializableLuceneObjects slObjects = (SerializableLuceneObjects)executionContext.get("luceneObjects");
IndexWriter luceneIndexWriter = slObjects.getLuceneIndexWriter();
IndexReader luceneIndexReader = slObjects.getLuceneIndexReader();
if(luceneIndexWriter !=null ) luceneIndexWriter.close();
if(luceneIndexReader != null) luceneIndexReader.close();
Community
  • 1
  • 1
Sabir Khan
  • 9,826
  • 7
  • 45
  • 98