1

I have an index users. These are my documents:

doc1 = {"user_id":1, "name":"first1 last1"}
doc2 = {"user_id":2, "name":"first2 last2"}

I'm trying to do a bulk insert using Python's requests

data_as_str = ""
data_as_str += json.dumps({ "_index": "users"}) + "\n"
data_as_str += json.dumps(doc1) + "\n"
data_as_str += json.dumps({ "_index": "users"}) + "\n"
data_as_str += json.dumps(doc2) + "\n"

headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}

r = requests.post("https://ES_HOST/_bulk", auth=awsauth, headers=headers, data=data_as_str)

The error I get is illegal_argument_exception:

Malformed action/metadata line [1], expected START_OBJECT but found [START_ARRAY]

I've tried putting it inside a list and adding extra newlines, etc.

EDIT:

If I send json instead of data:

r = requests.post(bulkurl, auth=awsauth, headers=headers, json=data_as_str)

then the error is The bulk request must be terminated by a newline [\n]

But I do end it with a newline.

user984003
  • 28,050
  • 64
  • 189
  • 285
  • Does this answer your question? [Python-automated bulk request for Elasticsearch not working "must be terminated by a newline"](https://stackoverflow.com/questions/61866140/python-automated-bulk-request-for-elasticsearch-not-working-must-be-terminated) – Joe - GMapsBook.com Dec 11 '20 at 23:35

1 Answers1

1

This works:

I had the format of the first dict wrong. I also changed the header, but it seems to work with the other one, as well. Send data, not json on the post request.

data_as_str = ""
data_as_str += json.dumps({"index": {"_index": "users","_id":1}}) + "\n"
data_as_str += json.dumps(doc1) + "\n"
data_as_str += json.dumps({"index": {"_index": "users","_id":2}}) + "\n"
data_as_str += json.dumps(doc2) + "\n"

headers = {'Content-Type': 'application/x-ndjson'}

r = requests.post("https://ES_HOST/_bulk", auth=awsauth, headers=headers, data=data_as_str)
user984003
  • 28,050
  • 64
  • 189
  • 285