5

I have to insert a json array in elastic. The accepted answer in the link suggests to insert a header-line before each json entry. The answer is 2 years old, is there a better solution out in the market? Need I edit my json file manually?

is there any way to import a json file(contains 100 documents) in elasticsearch server.?

[
  {
    "id":9,
    "status":"This is cool."
  },
  ...
]
Community
  • 1
  • 1
Forkmohit
  • 733
  • 3
  • 12
  • 31

1 Answers1

9

OK, then there's something pretty simple you can do using a simple shell script (see below). The idea is to not have to edit your file manually, but let Python do it and create another file whose format complies with what the _bulk endpoint expects. It does the following:

  1. First, we declare a little Python script that reads your JSON file and creates a new one with the required file format to be sent to the _bulk endpoint.
  2. Then, we run that Python script and store the bulk file
  3. Finally, we send the file created in step 2 to the _bulk endpoint using a simple curl command
  4. There you go, you now have a new ES index containing your documents

bulk.sh:

#!/bin/sh

# 0. Some constants to re-define to match your environment
ES_HOST=localhost:9200
JSON_FILE_IN=/path/to/your/file.json
JSON_FILE_OUT=/path/to/your/bulk.json

# 1. Python code to transform your JSON file
PYTHON="import json,sys;
out = open('$JSON_FILE_OUT', 'w');
with open('$JSON_FILE_IN') as json_in:
    docs = json.loads(json_in.read());
    for doc in docs:
        out.write('%s\n' % json.dumps({'index': {}}));
        out.write('%s\n' % json.dumps(doc, indent=0).replace('\n', ''));
"

# 2. run the Python script from step 1
python -c "$PYTHON"

# 3. use the output file from step 2 in the curl command
curl -s -XPOST $ES_HOST/index/type/_bulk --data-binary @$JSON_FILE_OUT

You need to:

  1. save the above script in the bulk.sh file and chmod it (i.e. chmod u+x bulk.sh)
  2. modify the three variable at the top (step 0) in ordre to match your environment
  3. run it using ./bulk.sh
Val
  • 207,596
  • 13
  • 358
  • 360
  • For recent versions of Elasticsearch, you need to add the content-type to the curl request with `-H 'Content-Type: application/x-ndjson'` – Raphaël Dec 06 '19 at 13:33
  • I know this is a fairly old thread at this point, but a question. Whenever I use this on my JSON file it adds an Index string at every character. Anybody else have this problem and how did you solve it? – Christopher Adkins Apr 25 '20 at 20:11
  • @ChristopherAdkins feel free to create a new question referencing this one and illustrate your exact issue. – Val Apr 25 '20 at 20:18
  • @val opened a new question, https://stackoverflow.com/questions/61580963/insert-multiple-documents-in-elasticsearch-bluk-doc-formatter – Christopher Adkins May 03 '20 at 20:10