3

I am trying to automate a bulk request for Elasticsearch via Python.

Therefore, i am preparing the data for the request body as follows (saved in a list as separate rows):

data = [{"index":{"_id": ID}}, {"tag": {"input": [tag], "weight":count}}]

Then i will use requests to do the Api call:

r = requests.put(endpoint, json = data, auth = auth)

This is giving me the Error: b'{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\\n]"}],"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\\n]"},"status":400}'

I know that i need to put a newline at the end of the request, and there lies my problem: How can i append a newline to that given data structure? I tried to append '\n' to my list at the end but that didnt work out.

Thank you guys!

  • Go a level lower and try to `curl` the endpoint from command line. See if this resolves your issues https://discuss.elastic.co/t/issue-with-json-bulk-insert-the-bulk-request-must-be-terminated-by-a-newline-n/165902/4 When that works you can try to adjust your python code. – Tin Nguyen May 18 '20 at 09:16
  • Thank you for your response! I already made a bulk request from Postman to the same endpoint, which worked perfectly! – Lukas Zimmermann May 18 '20 at 09:28
  • Ensure the API calls are exactly the same `put` vs `post`, headers, auth, data... – Tin Nguyen May 18 '20 at 09:36
  • it is exactly the same. I also did some requests (Get and Post) to the endpoint which worked perfectly – Lukas Zimmermann May 18 '20 at 09:52
  • Capture the API requests with Postman and you'll see they aren't exactly the same. If they were you wouldn't have an error with Python and no error without Python. – Tin Nguyen May 18 '20 at 10:46
  • Thats right. The Problem is, that i am not able to get my json file into the right format (in this case ndjson with \n as the last line) in python, in Postman i am able to do that. And thats where i am struggling – Lukas Zimmermann May 18 '20 at 11:19

1 Answers1

8

The payload's content type must be ndjson and the index attribute needs be specified as well. Here's a working snippet:

import requests
import json

endpoint = 'http://localhost:9200/_bulk'


#                  vvvvvv
data = [{"index": {"_index": "123", "_id": 123}},
        {"tag": {"input": ['tag'], "weight":10}}]


#         vvv                                              vvv
payload = '\n'.join([json.dumps(line) for line in data]) + '\n'

r = requests.put(endpoint,
                 # `data` instead of `json`!
                 data=payload,
                 headers={           
                     # it's a requirement
                     'Content-Type': 'application/x-ndjson'
                 })

print(r.json())

P.S.: You may want to consider the bulk helper in the official py client.

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • That solution works perfectly thank you very much! Although, i also tried it without including a name for the index attribute and it worked too. Are there any downsights in doing so? – Lukas Zimmermann May 19 '20 at 07:50
  • No prob! It's required when you don't specify it in the `_bulk` path -- otherwise ES wouldn't know where to put the docs (no pun intended). I reckon it worked for you because you *did* specify it in your endpoint. More on this here: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#bulk-api-request-body – Joe - GMapsBook.com May 19 '20 at 07:55