JavaScript fetch encodes emojis differently from Python Requests

Question

I'm trying to change my Wi-Fi SSID to be an emoji, but the web UI doesn't allow it. Instead, I capture a valid PUT request to the router's API, copy it as a fetch call using Chrome's Dev Tools, change the SSID to an emoji, and replay the request. It works great.

However, when I try to do it using Python Requests, it escapes the emoji () to the corresponding JavaScript escapes: \uD83E\uDD20. When this gets sent along to the router, it somehow gets translated to > (a greater than sign followed by a space). This is frustrating because I'd assume that both methods would encode the emoji the same way.

Since it works with JavaScript's fetch, there must be some difference in the way the message or the emoji is being encoded.

Fetch Call: (emoji just shows up as the emoji, even when inspecting the request with Dev Tools) (edited for brevity)

fetch("https://192.168.1.1/api/wireless", {
    "credentials": "omit",
    "headers": {
        "accept": "application/json, text/plain, */*",
        "content-type": "application/json;charset=UTF-8",
        "x-xsrf-token": "[The token for this login session]"
    },
    "referrer": "https://192.168.1.1/",
    "referrerPolicy": "no-referrer-when-downgrade",
    "body": "{
        \"wifi\": [{
            \"boring key 1\": \"boring value\",
            \"boring key 2\": \"boring value\",
            \"ssid\": \"\",
            \"boring key 3\": \"boring value\",
            \"boring key 4\": \"boring value\"
        }]
    }",
    "method": "PUT",
    "mode": "cors"
});

Requests Call: (edited for brevity)

res = session.put('https://192.168.1.1/api/wireless', 
                   verify=False, 
                   json={
                       "wifi":[{
                           "boring key 1":"boring value",
                           "boring key 2":"boring value",
                           "ssid":"",
                           "boring key 3":
                           "boring value",
                           "boring key 4":"boring value"
                       }]
                   })

So what's the difference in the way they're being encoded? And how can I see what fetch's actual output is? (Dev Tools just shows an emoji, no escape sequences.)

Also, incidentally, it seems like your router's JSON parser is probably broken, as it ignored the first two characters in the JSON ascii encoded unicode codepoint, such that it only looked at `3E` and `20` (instead of `D83E` and `DD20`, respectively) - decoding `3E` gets `'>'` and `20` gets `' '`, explaining your perplexing results. — metatoaster, Jul 11 '19 at 04:44

metatoaster · Accepted Answer · 2020-06-24T06:42:01.803

1

The default JSON handling done by the json argument in requests library will essentially have ensure_ascii be True, such that this type of encoded form be provided. Essentially, that put call will be sent to the server as:

PUT / HTTP/1.1
Host: 192.168.1.1
User-Agent: python-requests/2.21.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: 24
Content-Type: application/json

{"demo": "\ud83e\udd20"}

Which is not what you want. In order to do what you want, you will have to manually encode the JSON and provided the headers explicitly, like so:

requests.put(
    'https://192.168.1.1',
    data=json.dumps({"demo": ""}, ensure_ascii=False).encode('utf8'),
    headers={'Content-Type': 'application/json'},
)

edited Jun 24 '20 at 06:42

answered Jul 11 '19 at 03:50

metatoaster

17,419
5
55
66

Amazing, I spent at least 5 hours on that! Just one clarification, now Requests encodes it as `\xf0\x9f\xa4\xa0`. What's the difference between that and `\uD83E\uDD20`? Is the first UTF-8 and the second UTF-16? – Michael Kolber Jul 11 '19 at 04:24
If you are viewing this through the Python interpreter, it should be noted that `data` argument is not provided a `str` object, as `encode` turns it into a `byte` object, which should have a `b` prefix when printed through the python `repr`; the `\x` escapes are just ascii rendering of the actual bytes being sent. The `"\ud83e\udd20"` fragments in the default json.encode usage are the literal ascii characters that represents the encoded unicode codepoints in JSON (which is actually `\x5c\x75\x64\x38\x33\x65\x5c\x75\x64\x64\x32\x30` if printed out as `bytes` in the long, escaped form). – metatoaster Jul 11 '19 at 04:46
After lots more research I think I understand. This emoji uses 4 bytes under UTF-8, and each of `\xf0`, `\x9f`, `\xa4`, and `\xa0` are just hex representations of the bytes, `11110000 10011111 10100100 10100000`. However, when `\ud83e\udd20` was being sent (also prefixed with a `b`), they were being interpreted literally by the parser instead of being escaped. Is that correct? – Michael Kolber Jul 11 '19 at 05:10
1

Actually, I also never checked what the `D83E` and `DD20` code points actually represent - they are actually the high and low surrogate defined in the unicode standard, and do not correspond to valid characters. Given that JSON unicode escape only accept the Basic Multilingual Plane, the `1F920` codepoint ends up needing those two surrogate characters, which is then escaped into `\ud83e\udd20` when encoded to ascii escaped representation. However, the parser on the router then fail to properly decode this surrogate pair back into the actual original codepoint of `1F920`. – metatoaster Jul 11 '19 at 05:34
[Somewhat relevant SO thread on this topic](https://stackoverflow.com/questions/11641983/encoding-json-in-utf-16-or-utf-32). – metatoaster Jul 11 '19 at 05:34

JavaScript fetch encodes emojis differently from Python Requests

1 Answers1