Index JSON files in elasticsearch using Python?

Question

I have a bunch of JSON files(100), which are named as merged_file 1.json, merged_file 2. json and so on.

How do I index all these files into elasticsearch using python(elasticsearch_dsl) ?

I am using this code, but it doesn't seem to work:

from elasticsearch_dsl import Elasticsearch
import json
import os
import sys

es = Elasticsearch()

json_docs =[]

directory = sys.argv[1]

for filename in os.listdir(directory):
    if filename.endswith('.json'):
        with open(filename,'r') as open_file:
            json_docs.append(json.load(open_file))

es.bulk("index_name", "type_name", json_docs)

The JSON looks like this:

{"one":["some data"],"two":["some other data"],"three":["other data"]}

What can I do to make this correct ?

You're missing the command line before each document. See [here](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-bulk) for more details. — Val, May 15 '17 at 13:53

score 12 · Accepted Answer · answered May 15 '17 at 15:05

12

For this task you should be using elasticsearch-py (pip install elasticsearch):

from elasticsearch import Elasticsearch, helpers
import sys, json

es = Elasticsearch()

def load_json(directory):
    " Use a generator, no need to load all in memory"
    for filename in os.listdir(directory):
        if filename.endswith('.json'):
            with open(filename,'r') as open_file:
                yield json.load(open_file)

helpers.bulk(es, load_json(sys.argv[1]), index='my-index', doc_type='my-type')

answered May 15 '17 at 15:05

Honza Král

2,982
14
11

How do I get the id's of the jsons that are indexed ? – anshaj May 16 '17 at 09:24
3

If you care about the ids (elasticsearch will create random ones for you otherwise) just have an `_id` field in you json either directly or maybe put the filename there or something – Honza Král May 17 '17 at 20:14
This throws error in the action parameter of bulk. """ ~\Anaconda3\lib\site-packages\elasticsearch\helpers\actions.py in expand_action(data) 25 # make sure we don't alter the action 26 data = data.copy() ---> 27 op_type = data.pop("_op_type", "index") 28 action = {op_type: {}} 29 for key in ( TypeError: pop() takes at most 1 argument (2 given)""" – Murtaza Haji Apr 07 '20 at 20:27

Index JSON files in elasticsearch using Python?

1 Answers1

Linked