0

I want to send post request to some app.
This code does not work properly with cyrillic strings (user saved with wrong characters):

import requests

s = requests.Session()

d = {
        'name': 'Вова',
        'surname': 'Петров',
    }

response = s.post('http://localhost:8899/app/user/', json=d)

print(response.status_code)
print(response.text)

I run netcat to inspect raw request:
nc -kl 8899

POST /accsrv/http/person/ HTTP/1.1
Host: localhost:8899
User-Agent: python-requests/2.22.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: 147
Content-Type: application/json

{"name": "\u0420\u2019\u0420\u0455\u0420\u0406\u0420\u00b0", "surname": "\u0420\u045f\u0420\u00b5\u0421\u201a\u0421\u0402\u0420\u0455\u0420\u0406"}

I also check with curl (it works fine):
curl -d '{"name":"Вова", "surname":"Петров"}' -H "Content-Type: application/json" -X POST http://localhost:8899/app/user/

POST /app/user/ HTTP/1.1
Host: localhost:8899
User-Agent: curl/7.64.0
Accept: */*
Content-Type: application/json
Content-Length: 45

{"name":"Вова", "surname":"Петров"}

So I think that problem with unicode encoding (server app documentation is weak and I have no access sources).

How to bring the request to the second form?

Python 3.7.3

victor1234
  • 871
  • 3
  • 12
  • 28
  • `s.post('http://localhost:8899/app/user/', data=json.dumps(d, ensure_ascii=False), headers={"Content-Type": "application/json; charset=UTF8"})` will post the JSON data using full UTF-8 byte sequences for non-ASCII characters. – Martijn Pieters Nov 23 '19 at 18:31
  • To save these characters properly (assuming you are saving to a DB), you need to make sure the encoding is right on multiple points, including the backend and the database (for db, set the right encoding via connection string) – Anis R. Nov 23 '19 at 18:33
  • Ah, what you posted is **not the same data you used in your CURL command line**. You posted `{'name': 'Р’РѕРІР°', 'surname': 'Петров'}`. – Martijn Pieters Nov 23 '19 at 18:34
  • Put differently, the error is not with the server, but with *how you saved your source code* or with how you read the data. The exact data you give `requests` is being posted, but that doesn't mean that the data was correct to begin with. – Martijn Pieters Nov 23 '19 at 18:35
  • @MartijnPieters I send curl command from utf console and run pyhton script saved in utf too. – victor1234 Nov 23 '19 at 18:41
  • 1
    Using `ftfy.fixes.fix_one_step_and_explain("\u0420\u045f\u0420\u00b5\u0421\u201a\u0421\u0402\u0420\u0455\u0420\u0406")` (from the [`ftfy` project](https://ftfy.readthedocs.io/en/latest/)) shows me that you indeed have a mojibake: `('Петров', [('encode', 'sloppy-windows-1251', 3), ('decode', 'utf-8', 0)])`. – Martijn Pieters Nov 23 '19 at 18:41
  • @victor1234: but you *don't*. That's the problem. You are not sending the same data, at all. – Martijn Pieters Nov 23 '19 at 18:42
  • @MartijnPieters Sorry what part of my question displayed in this way on your side: {'name': 'Р’РѕРІР°', 'surname': 'Петров'} ? – victor1234 Nov 23 '19 at 18:42
  • 1
    @victor1234: that's the value you posted. The `"\u0420\u045f\u0420\u00b5\u0421\u201a\u0421\u0402\u0420\u0455\u0420\u0406"` JSON data. – Martijn Pieters Nov 23 '19 at 18:44
  • And in Python, using `"\u0420\u045f\u0420\u00b5\u0421\u201a\u0421\u0402\u0420\u0455\u0420\u0406".encode("sloppy-windows-1251").decode("utf8")` (with `ftfy` imported so the `sloppy-windows-1251` codec is installed), you get `'Петров'`. – Martijn Pieters Nov 23 '19 at 18:45
  • And `"\u0420\u2019\u0420\u0455\u0420\u0406\u0420\u00b0"` is `'Р’РѕРІР°'`, but with the Mojibake repaired via `"\u0420\u2019\u0420\u0455\u0420\u0406\u0420\u00b0".encode("sloppy-windows-1251").decode("utf8")` I get `'Вова'`. – Martijn Pieters Nov 23 '19 at 18:46
  • Means I somehow saved cyrrilic string in python code incorrectly. not in utf, right? – victor1234 Nov 23 '19 at 18:47
  • Put differently, the code in your question, using a literal `{'name': 'Вова', 'surname': 'Петров'}` works just fine and posts `{"name": "\u0412\u043e\u0432\u0430", "surname": "\u041f\u0435\u0442\u0440\u043e\u0432"}` to the server. But that's not how your *actual* code loads the dictionary to post, is it. – Martijn Pieters Nov 23 '19 at 18:48
  • @victor1234: it means you created a [Mojibake](https://en.wikipedia.org/wiki/Mojibake), yes. Or are reading from a source that already is a Mojibake. – Martijn Pieters Nov 23 '19 at 18:48
  • @MartijnPieters strange that I also test with httpie util. And it produce \u041f.. too. So from same console curl command works fine and httpie not (both of them did not read any data from disk) – victor1234 Nov 23 '19 at 18:55

0 Answers0