I'm trying to reIndex ElasticSearch, I used Scan and Bulk API, but it's very slow, how can I parallel the process to make it faster. My python code as following:
actions=[]
for hit in helpers.scan(es,scroll='20m',index=INDEX,doc_type=TYPE,params=
{"size":100}):
value= hit.get('_source')
idval = hit.get('_id')
action = indexAction(INDEX_2,TYPE_2,idval,value)
actions.append(action)
count+=1
if(count%200==0):
helpers.bulk(es, actions,stats_only=True,params=
{"consistency":"one","chunk_size":200})
actions=[]
Should I make the scan multiple process or should I make the bulk multiple processes. I've bean wandering how does ElasticSearch-Hadoop to implement this. My index has 10 nodes and 20 shards.