0

I have a logstash process running which is consuming data from a kafka topic. Messages in kafka topic are already in json format. logstash is simply pushing them in elastic search. But while doing so logstash changes ordering of the fields. There is a team which is consuming csv format of the data, so the changed ordering gives them trouble. What could be the reason?

for e.g input json {"foo1":"bar1","foo2":"bar2"}. logstash pushes it in elastic then in elastic it looks like {"foo2":"bar2","foo1":"bar1"}

logstash config

input{
    kafka{
        codec=>'json' bootstrap_servers => [localhost:9092] topics =>  'sample-logs' auto_offset_reset => 'earliest' => group_id => 'logstash-consumer'
    }
}
output {
     elasticsearch { 
         hosts => "localhost:9200", codec => json index=> "sample-logs-es" } 
stdout { 
    codec => rubydebug 
}
Shades88
  • 7,934
  • 22
  • 88
  • 130
  • 1
    http://stackoverflow.com/questions/3948206/json-order-mixed-up - don't rely on the order of json object fields -- ever. – Alcanzar May 09 '17 at 14:43
  • I'm not a logstash expert. But if you just forward the data does logstash has to know that this data are json data? Can't you just set the codec to something like _text_ instead of _json_?Just an educated guess ... – TobiSH May 09 '17 at 14:52
  • @Alcanzar but elastic search maintains the order of input json. Meaning, if you enter json data in elastic search directly, order doesn't change. Hence, I am wondering why only Logstash is doing this – Shades88 May 10 '17 at 04:56

2 Answers2

0

Two good reason to have it in the same order or sorted:

  1. the _source fields better compress if you have a lot of similar data
  2. Easier for humans looking at the data in Kibana

I have a logstash Ruby scripts that corrects for version updates in the code processing and some past mistakes. Sadly I also get random order JSON out of it. And also have no idea yet on how to get it sorted again for ingestion into Elastic. Crude aproach would be dumping all to file, use JQ and then ingest directly.

Hans
  • 467
  • 4
  • 12
-1

set the pipeline.worker to 1, or multiple worker will do filter + output parallel

xianyunlan
  • 17
  • 2