11

This question arises from this SO thread.

As it seems I have a similar but not the same query, it might be best to have a separate question for others to benefit from, as @Val suggested.

So, similar to the above, I have the need to insert a massive amount of data into an index (my initial testing is about 10 000 documents but this is just for a POC, there are many more). The data I would like to insert is in a .json document and looks something like this (snippet):

[ { "fileName": "filename", "data":"massive string text data here" }, 
  { "fileName": "filename2", "data":"massive string text data here" } ]

On my own admission I am new to ElasticSearch, however, from reading through the documentation, my assumptions were that I could take a .json file and create an index from the data within. I have now since learnt that it seems each item within the json needs to have a "header", something like:

{"index":{}}
{ "fileName": "filename", "data":"massive string text data here" }

Meaning, that this is not actual json format (as such) but rather manipulated string.

I would like to know if there is a way to import my json data as is (in json format), without having to manually manipulate the text first (as my test data has 10 000 entries, I'm sure you can see why I'd prefer not doing this manually).

Any suggestions or suggested automated tools to help with this?

PS - I am using the windows installer and Postman for the calls.

Hexie
  • 3,955
  • 6
  • 32
  • 55

2 Answers2

19

You can transform your file very easily with a single shell command like this. Provided that your file is called input.json, you can do this:

jq -c -r ".[]" input.json | while read line; do echo '{"index":{}}'; echo $line; done > bulk.json

After this you'll have a file called bulk.json which is properly formatted to be sent to the bulk endpoint.

Then you can call your bulk endpoint like this:

curl -XPOST localhost:9200/your_index/your_type/_bulk -H "Content-Type: application/x-ndjson" --data-binary @bulk.json

Note: You need to install jq first if you don't have it already.

Leandro Caniglia
  • 14,495
  • 4
  • 29
  • 51
Val
  • 207,596
  • 13
  • 358
  • 360
  • 1
    Tried this with a file I had ~64MB in size with ~75000 records and could do the transformation in about a minute and the load in less than 30 seconds. – Frans Sep 02 '19 at 12:45
  • Is there an alternative solution? For some reason I am not able to get jq working. I have it downloaded, but keep getting a callback saying 'jq' is not recognized when I run this command. – cluis92 Nov 14 '19 at 21:51
  • @Val I have tried the jq command within Windows Powershell and am receiving 'Missing statement body in do loop' I think it is because I am not specifying the input file correctly. For my input file I am putting "@C:\setting-es.json" .. I think I am having trouble with the syntax for jq – cluis92 Nov 17 '19 at 21:10
  • 1
    this worked for me jq -c '.[] | ({"index":{}}, [.])' activity-es-jq.json > bulk-activity.json executed from powershell – cluis92 Nov 18 '19 at 17:21
  • @cluis92 any reasons to put it in an array like `[.]` ? `jq -c '.[] | ({"index":{}}, .)'` works for me. – Kenji Noguchi Aug 18 '21 at 19:51
  • while + external command loop is very slow. jq alone is sufficient. If you can install awk in Windows Power Shell, awk is even faster. `awk '{print "{\"index\": {}}\n" $1}'` – Kenji Noguchi Aug 18 '21 at 20:05
1

this is my code to bulk data to es


const es = require("elasticsearch");
const client = new es.Client({
  hosts: ["http://localhost:9200"],
});

const cities = <path to your json file>;

let bulk: any = [];

cities.forEach((city: any) => {
  bulk.push({
    index: {
      _index: <index name>,
      _type: <type name>,
    },
  });

  bulk.push(city);
});


//loop through each city and create and push two objects into the array in each loop
//first object sends the index and type you will be saving the data as
//second object is the data you want to index

client.bulk({ body: bulk }, function (err: any, response: any) {
  if (err) {
    console.log("Failed Bulk operation", err);
  } else {
    console.log("Successfully imported %s", cities.length);
  }
});

or you can use library like elasticdump or elasticsearch-tools

DamarOwen
  • 147
  • 1
  • 5