1

I have to index about 400 billion documents to solr 6.3. I am using pysolr to parse my csv data before indexing. How I can speedup my indexing. In order to index a document to solr, it used add method that has following syntax bydefault

add(self, docs, boost=None, fieldUpdates=None, commit=True, softCommit=False, commitWithin=None, waitFlush=None, waitSearcher=None, overwrite=None, handler='update')

One basis option, is that I should make commit and softcommit to false for fast indexing. Is it right way?

Any other option to peroform fast indexing?

Hafiz Muhammad Shafiq
  • 8,168
  • 12
  • 63
  • 121

1 Answers1

1

See if you commit in single go it will be very memory expensive. So better option is to commit in batches so what I would suggest is to keep the count variable

if(count == 10000)
{
perform solr commit operation
}

Also, make your indexing script multi-threaded to fastly complete these batches.

Aman Tandon
  • 1,379
  • 2
  • 13
  • 26