3

I wrote a piece of code that aims at injecting IRC eggdrop logs to an elasticsearch 6.4 database using python elasticsearch's bulk. It is written in python, using version 3.7, tested on NetBSD, Linux and Mac OS X.
Some logfiles are imported, but some fail with this error:

elasticsearch.exceptions.RequestError: TransportError(400, 
'illegal_argument_exception', 'Malformed action/metadata line [387], 
expected START_OBJECT or END_OBJECT but found [VALUE_STRING]')

I read quite a lot of elastic.co forums posts and SO answers where OP had a formatting error on its dataset, missing fields or so, but I can't find one in mine. Plus, again, not all the logfiles are rejected with this error.

I was looking at an encoding error but everything seems file on this area.

  • Here is a sample file that produces this error. It's 100% valid JSON.
  • Here is the python code.

Ideas?

iMil
  • 816
  • 9
  • 16

1 Answers1

3

In your json response, there are newline characters in the json. One of the values for _source is null. Python won't be able to treat them as dictionaries. While forming the bulk request, clean up the complete data and hit the bulk API.

Json at 192nd index is having null as _source.

Json at 47th index is having newline characters.

Please cleanup those data while exporting to elasticsearch.

Hope this helps.

BarathVutukuri
  • 1,265
  • 11
  • 23
  • Bingo, you got it, the `null` `_source` was the failing reason, thanks a lot for pointing this out!... and I am really ashamed not to have seen that one. For the record, the _pretty-print'ed JSON_ was for readability, it was actually one-lined when passed to `helpers.bulk()`. The newlines must have been interpreted by the shell when outputting the JSON. – iMil Nov 17 '18 at 12:51
  • 1
    Glad, it helped :) – BarathVutukuri Nov 17 '18 at 13:08