I have several million history objects that I need to save to Elasticsearch. What would be the best way to do this, without going into the internals of elasticsearch? Here is the pattern I'm currently using:
ACTIONS = []
NUM_ACTIONS_TO_BULK = 10000
for num, item in enumerate(HISTORY_DATA.values()):
ACTIONS.append({
"_index": ES_INDEX_NAME,
"_type": "_doc",
"_id": item.pop('_id'),
"_source": item
})
# Save every 10k and again at the end
if (len(ACTIONS) == NUM_ACTIONS_TO_BULK) or (num == len(HISTORY_DATA) - 1):
log.info('%s/%s - Saving %s items to ES...' % (num, len(HISTORY_DATA), len(ACTIONS))
_ = helpers.bulk(self.es, ACTIONS)
ACTIONS = []
The above saves it to ES in batches of 10k. Is this the best/most efficient way to save things to ES? For example, what if I tried saving all 15M objects directly to ES using helpers.bulk -- does that chunk the items, or does it try saving it all at once? Does it look like I'm missing anything in the above?