6

I have an ElasticSearch index with around 200M documents, total index size of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with the new mapping), then goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly. It started with 300 documents / minute two days ago, and now the speed is 150 documents/minute.

The script runs on a machine within the same network the elastic search machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic search index?

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
diemacht
  • 2,022
  • 7
  • 30
  • 44

2 Answers2

4

Answered in the google groups:

Option A: Use bulk index operations.

Option B: Use the re-index plug-in that runs inside ES machine: https://github.com/karussell/elasticsearch-reindex

diemacht
  • 2,022
  • 7
  • 30
  • 44
0

The proper way how to reindex with Elasticsearch is to use the scan and scroll APIs, which should be supported by Pyes.

It seems like the Pyes library has a reindex method, but I don't have experience with it.

(If you'd get over using Ruby over Python :), the Tire Ruby client has a Index#reindex method: https://github.com/karmi/tire/blob/master/test/integration/reindex_test.rb. It should be fast enough for your data.)

karmi
  • 14,059
  • 3
  • 33
  • 41
  • Thanks Karmi! Do you have any approximation what should be the expected time to do such an operation on an index of 90Gb (200M documents)? – diemacht Jun 10 '13 at 06:18
  • It depends on whether you'd be able to paralellize the operation or not. Elasticsearch can handle pretty high write load, but the reindexing script is usually the bottleneck. Try to reindex just the portion of the data, and extrapolate -- the performance of the scan/scroll API should not "decay" over time. – karmi Jun 10 '13 at 11:10
  • 2
    It appears that Tire has been deprecated. Elastic now lists recomends clients here: https://github.com/elastic/elasticsearch-rails – spuder Apr 09 '15 at 17:55