Is there any way to import a JSON file (contains 100 documents) in elasticsearch server? I want to import a big json file into es-server..
-
7i know about bulk api but i do not want to use bulk becuause it requires manually editing of fields and schemas.i would like to upload json file in one shot. i used bulk-api but it requires manually editing.i would like to import my json as it is. anyway thanks for reply. i got stream2es (for stream input) and FSRiver for some extent these are usefull for me ------------------------------------------------------------------------ – shailendra pathak Dec 18 '13 at 15:29
9 Answers
As dadoonet already mentioned, the bulk API is probably the way to go. To transform your file for the bulk protocol, you can use jq.
Assuming the file contains just the documents itself:
$ echo '{"foo":"bar"}{"baz":"qux"}' |
jq -c '
{ index: { _index: "myindex", _type: "mytype" } },
. '
{"index":{"_index":"myindex","_type":"mytype"}}
{"foo":"bar"}
{"index":{"_index":"myindex","_type":"mytype"}}
{"baz":"qux"}
And if the file contains the documents in a top level list they have to be unwrapped first:
$ echo '[{"foo":"bar"},{"baz":"qux"}]' |
jq -c '
.[] |
{ index: { _index: "myindex", _type: "mytype" } },
. '
{"index":{"_index":"myindex","_type":"mytype"}}
{"foo":"bar"}
{"index":{"_index":"myindex","_type":"mytype"}}
{"baz":"qux"}
jq's -c
flag makes sure that each document is on a line by itself.
If you want to pipe straight to curl, you'll want to use --data-binary @-
, and not just -d
, otherwise curl will strip the newlines again.

- 29,454
- 5
- 48
- 60
-
This answer was extremely helpful. I was able to figure out how to get this working based on your explanation alone - if I could vote this twice, I would! – Nate Barbettini Sep 14 '15 at 04:09
-
Thanks for the tip on using `--data-binary` - answered my question perfectly. – Darragh Enright Nov 24 '16 at 17:08
-
So unfortunate that ElasticSearch doesn't provide first-class support for huge JSON file import OOTB (`jq` is not feasible for Windows users and its kinda hacky). – KrishPrabakar Nov 20 '19 at 07:13
You should use Bulk API. Note that you will need to add a header line before each json document.
$ cat requests
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
$ curl -s -XPOST localhost:9200/_bulk --data-binary @requests; echo
{"took":7,"items":[{"create":{"_index":"test","_type":"type1","_id":"1","_version":1,"ok":true}}]}

- 43,537
- 11
- 94
- 122

- 14,109
- 3
- 42
- 49
-
-
This is a header `{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }` – dadoonet Aug 31 '15 at 12:15
-
3It's not. And if you are using `index/type/_bulk` endpoint, you can also ignore `_index` and `_type`. – dadoonet Mar 23 '16 at 14:48
-
If we've removed `_id`, `_index`, and `_type`, then the object will be empty, like this: `{ "index" : { } }` . Is that OK? – Nate Anderson Apr 24 '16 at 20:19
-
4Just for information (in case anybody comes across this question), yes, it works with an empty index header, as written by @The Red Pea – Sir4ur0n May 10 '16 at 11:30
I'm sure someone wants this so I'll make it easy to find.
FYI - This is using Node.js (essentially as a batch script) on the same server as the brand new ES instance. Ran it on 2 files with 4000 items each and it only took about 12 seconds on my shared virtual server. YMMV
var elasticsearch = require('elasticsearch'),
fs = require('fs'),
pubs = JSON.parse(fs.readFileSync(__dirname + '/pubs.json')), // name of my first file to parse
forms = JSON.parse(fs.readFileSync(__dirname + '/forms.json')); // and the second set
var client = new elasticsearch.Client({ // default is fine for me, change as you see fit
host: 'localhost:9200',
log: 'trace'
});
for (var i = 0; i < pubs.length; i++ ) {
client.create({
index: "epubs", // name your index
type: "pub", // describe the data thats getting created
id: i, // increment ID every iteration - I already sorted mine but not a requirement
body: pubs[i] // *** THIS ASSUMES YOUR DATA FILE IS FORMATTED LIKE SO: [{prop: val, prop2: val2}, {prop:...}, {prop:...}] - I converted mine from a CSV so pubs[i] is the current object {prop:..., prop2:...}
}, function(error, response) {
if (error) {
console.error(error);
return;
}
else {
console.log(response); // I don't recommend this but I like having my console flooded with stuff. It looks cool. Like I'm compiling a kernel really fast.
}
});
}
for (var a = 0; a < forms.length; a++ ) { // Same stuff here, just slight changes in type and variables
client.create({
index: "epubs",
type: "form",
id: a,
body: forms[a]
}, function(error, response) {
if (error) {
console.error(error);
return;
}
else {
console.log(response);
}
});
}
Hope I can help more than just myself with this. Not rocket science but may save someone 10 minutes.
Cheers

- 7,608
- 2
- 24
- 43
-
1There's something I don't get here. Won't this make `pubs.length + forms.length` different operations? Instead of just one, which is `_bulk`'s point? I found [this thread](https://stackoverflow.com/questions/37728650/nodejs-elasticsearch-bulk-api-error-handling), where @keety's answer uses `client.bulk()` to insert everything in one operation, which makes way more sense IMO – Jeremy Thille Jul 27 '18 at 03:51
-
1@JeremyThille indeed that is the better way and at the time I wrote this I either hadn't made my way that far in the docs or it wasn't an option yet, and this worked for my very specific use-case. Now I don't use the JS client at all and do a direct call to `/_bulk` with all the data combined. – Deryck Jul 28 '18 at 01:53
jq is a lightweight and flexible command-line JSON processor.
Usage:
cat file.json | jq -c '.[] | {"index": {"_index": "bookmarks", "_type": "bookmark", "_id": .id}}, .' | curl -XPOST localhost:9200/_bulk --data-binary @-
We’re taking the file file.json and piping its contents to jq first with the -c flag to construct compact output. Here’s the nugget: We’re taking advantage of the fact that jq can construct not only one but multiple objects per line of input. For each line, we’re creating the control JSON Elasticsearch needs (with the ID from our original object) and creating a second line that is just our original JSON object (.).
At this point we have our JSON formatted the way Elasticsearch’s bulk API expects it, so we just pipe it to curl which POSTs it to Elasticsearch!
Credit goes to Kevin Marsh

- 1,579
- 1
- 19
- 34
Import no, but you can index the documents by using the ES API.
You can use the index api to load each line (using some kind of code to read the file and make the curl calls) or the index bulk api to load them all. Assuming your data file can be formatted to work with it.
A simple shell script would do the trick if you comfortable with shell something like this maybe (not tested):
while read line
do
curl -XPOST 'http://localhost:9200/<indexname>/<typeofdoc>/' -d "$line"
done <myfile.json
Peronally, I would probably use Python either pyes or the elastic-search client.
pyes on github
elastic search python client
Stream2es is also very useful for quickly loading data into es and may have a way to simply stream a file in. (I have not tested a file but have used it to load wikipedia doc for es perf testing)

- 8,169
- 5
- 31
- 37
-
-
-
1Couple things you should fix: The correct method is POST, as the PUT endpoint expects you to specify an ID ([reference](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#_automatic_id_generation)); '$line' will substitute to $line literally, should be "$line" Otherwise thanks, this was just what I was looking for. – Michele De Pascalis Feb 15 '17 at 16:34
-
edits made - thanks for the code review on the untested snippet above. – mconlin Mar 03 '17 at 15:06
Stream2es is the easiest way IMO.
e.g. assuming a file "some.json" containing a list of JSON documents, one per line:
curl -O download.elasticsearch.org/stream2es/stream2es; chmod +x stream2es
cat some.json | ./stream2es stdin --target "http://localhost:9200/my_index/my_type

- 2,035
- 2
- 17
- 27
-
is that second line correct? I had to use the `stdin` command, like this: `cat some.json | ./stream2es stdin --target http://localhost:9200/myindex/mytype` – natxty May 22 '15 at 18:19
you can use Elasticsearch Gatherer Plugin
The gatherer plugin for Elasticsearch is a framework for scalable data fetching and indexing. Content adapters are implemented in gatherer zip archives which are a special kind of plugins distributable over Elasticsearch nodes. They can receive job requests and execute them in local queues. Job states are maintained in a special index.
This plugin is under development.
Milestone 1 - deploy gatherer zips to nodes
Milestone 2 - job specification and execution
Milestone 3 - porting JDBC river to JDBC gatherer
Milestone 4 - gatherer job distribution by load/queue length/node name, cron jobs
Milestone 5 - more gatherers, more content adapters
One way is to create a bash script that does a bulk insert:
curl -XPOST http://127.0.0.1:9200/myindexname/type/_bulk?pretty=true --data-binary @myjsonfile.json
After you run the insert, run this command to get the count:
curl http://127.0.0.1:9200/myindexname/type/_count

- 3,172
- 14
- 35