Json to Elasticsearch via API

Question

I am trying to add a json file to elasticsearch which has around 30.000 lines and it is not properly formatted. I'm trying to upload it via Bulk API but I can't find a way to format it properly that actually works. I'm using Ubuntu 16.04LTS.

This is the format of the json:

{
    "rt": "2018-11-20T12:57:32.292Z",
    "source_info": { "ip": "0.0.60.50" },
    "end": "2018-11-20T12:57:32.284Z",
    "severity": "low",
    "duid": "5b8d0a48ba59941314e8a97f",
    "dhost": "004678",
    "endpoint_type": "computer",
    "endpoint_id": "8e7e2806-eaee-9436-6ab5-078361576290",
    "suser": "Katerina",
    "group": "PERIPHERALS",
    "customer_id": "a263f4c8-942f-d4f4-5938-7c37013c03be",
    "type": "Event::Endpoint::Device::AlertedOnly",
    "id": "83d63d48-f040-2485-49b9-b4ff2ac4fad4",
    "name": "Peripheral allowed: Samsung Galaxy S7 edge"
}

I do know that the format for the Bulk API needs {"index":{"_id":*}} before each json object in the file which it'd look like this:

{"index":{"_id":1}}

{
    "rt": "2018-11-20T12:57:32.292Z",
    "source_info": { "ip": "0.0.60.50" },
    "end": "2018-11-20T12:57:32.284Z",
    "severity": "low",
    "duid": "5b8d0a48ba59941314e8a97f",
    "dhost": "004678",
    "endpoint_type": "computer",
    "endpoint_id": "8e7e2806-eaee-9436-6ab5-078361576290",
    "suser": "Katerina",
    "group": "PERIPHERALS",
    "customer_id": "a263f4c8-942f-d4f4-5938-7c37013c03be",
    "type": "Event::Endpoint::Device::AlertedOnly",
    "id": "83d63d48-f040-2485-49b9-b4ff2ac4fad4",
    "name": "Peripheral allowed: Samsung Galaxy S7 edge"
}

If I insert the index id manually and then use this expression curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:92100/ivc/default/bulk?pretty --data-binary @results.json it will upload it with no errors.

My question is, how can I add the index id {"index":{"_id":*}} to each line of the json to make it ready to upload? Obviously the index id has to add +1 each line, is there any way to do it from the CLI?

Sorry if this post doesn't look as it should, I read millions of posts in Stack Overflow but this is my first one! #Desperate

Thank you very much in advance!

This answer might help: https://stackoverflow.com/a/45604500/4604579 — Val, Jan 03 '19 at 13:48
Well not sure you can do it with CLI, but take a look at logstash, should be fast. — LeBigCat, Jan 03 '19 at 14:02
This solution with jq doesn't work unfortunately. :( It would put an "index" after every field of every json object. I want the "index" after every whole json object so Bulk API will accept it. It does not with this format obviously. :/ — Eloy, Jan 03 '19 at 14:35

score 0 · Answer 1 · answered Jan 03 '19 at 14:45

Your problem is that Elasticsearch expects the document to be a valid json on ONE line, like this :

{"index":{"_id":1}}
{"rt":"2018-11-20T12:57:32.292Z","source_info":{"ip":"0.0.60.50"},"end":"2018-11-20T12:57:32.284Z","severity":"low","duid":"5b8d0a48ba59941314e8a97f","dhost":"004678","endpoint_type":"computer","endpoint_id":"8e7e2806-eaee-9436-6ab5-078361576290","suser":"Katerina","group":"PERIPHERALS","customer_id":"a263f4c8-942f-d4f4-5938-7c37013c03be","type":"Event::Endpoint::Device::AlertedOnly","id":"83d63d48-f040-2485-49b9-b4ff2ac4fad4","name":"Peripheral allowed: Samsung Galaxy S7 edge"}

You have to find a way to transform your input file so that you have a document per line, then you'll be good to go with Val's solution.

Thank you for your reply Christophe! In fact, when I open the file in Pluma, it says shows only two lines, 1st is index and the 2nd is the body of the json in one line only. Sorry I am quite lost — Eloy, Jan 03 '19 at 15:17

score 0 · Accepted Answer · answered Jan 08 '19 at 10:30

Thank you for all the answers, they did help to get in me in the right direction.

I've made a bash script to automate the download, formatting and upload of the logs to Elasticsearch:

#!/bin/bash

echo "Downloading logs from Sophos Central. Please wait."

cd /home/user/ELK/Sophos-Central-SIEM-Integration/log

#This deletes the last batch of results
rm result.json
cd .. 

#This triggers the script to download a new batch of logs from Sophos

./siem.py
cd /home/user/ELK/Sophos-Central-SIEM-Integration/log

#Adds newline at the beginning of the logs file
sed -i '1 i\{"index":{}}' result.json

#Adds indexes
sed -i '3~2s/^/{"index":{}}/' result.json

#Adds json file to elasticsearch 
curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/ivc/default/_bulk?pretty --data-binary @result.json

So that's how I achieved this. There might be easier options but this one did the trick for me. Hope it can be useful for someone else!

Again thank you everyone! :D

Json to Elasticsearch via API

2 Answers2