0

I'm new to Elasticsearch. I'm trying to index a json file which contains 100,000+ objects. The format of my json file is:

    [{"ingredients": [{"text": "Butter"}, {"text": "Strawberries"}, {"text": "Granola"}], 
    "url": "http://tastykitchen.com/recipes/breakfastbrunch/yogurt-parfaits/", 
    "title": "Yogurt Parfaits", 
    "id": "000095fc1d", 
    "instructions": [{"text": "Layer all ingredients in a serving dish."}]},
     {"ingredients":
     .....]

This is in the form of a list. The python code I'm using write now to index the file is:

from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost','port': 9200}])
f = open('data.json')
import json
data = json.load(f)
for i in data:
     res = es.index(index='food',doc_type='Recipe',id=i["id"],body=i)

This method is taking a lot of time and is inefficient. The other methods I read needed the file in the format:

{"index": {"_index": "index_name", "_type": "index_type", "_id": "doc_id"}}
{"ingredients:....

Can you suggest an efficient method to index the file?

  • Possibly duplicated: https://stackoverflow.com/questions/20288770/how-to-use-bulk-api-to-store-the-keywords-in-es-by-using-python – Frank Apr 22 '20 at 18:25
  • Does this answer your question? [How to use Bulk API to store the keywords in ES by using Python](https://stackoverflow.com/questions/20288770/how-to-use-bulk-api-to-store-the-keywords-in-es-by-using-python) – Joe - GMapsBook.com Apr 22 '20 at 18:48

1 Answers1

0

Try to use Elasticsearch bulk Api

Performs multiple indexing or delete operations in a single API call. This reduces overhead and can greatly increase indexing speed.

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#docs-bulk https://elasticsearch-py.readthedocs.io/en/master/helpers.html

Ashraful Islam
  • 12,470
  • 3
  • 32
  • 53