0

I have the following output from a REST request:

b'{
    "recipient": {
        "address": "v\xc3\xa4g 4",
    },
}'

The \xc3\xa4 comes from the character ä. I'm looking for a simple way to normalize the entire content above so that \xc3\xa4 is replaced by something meaningful (e.g. a).

My script processes addresses in many different languages so I'm looking for something that provides a meaningful replacement across many different special characters.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
mfcss
  • 1,039
  • 1
  • 9
  • 25
  • 1
    The question is not exactly the same as yours, but the answer will probably fit you as well: https://stackoverflow.com/a/2633310/791430 – Omer Tuchfeld Apr 13 '21 at 07:55
  • So you want to denormalize `ä` and print only the latin character, right? – Liscare Apr 13 '21 at 08:13
  • I'm looking to convert `"v\xc3\xa4g 4"` to `"vag 4"` – mfcss Apr 13 '21 at 08:18
  • 1
    Do you know ahead of time which strings you expect to receive? Would a simple string replace work? – Ophir S Apr 13 '21 at 08:26
  • `b'{\n"recipient": {\n"address": "v\xc3\xa4g 4",\n},\n}'.decode('utf-8')` – JosefZ Apr 13 '21 at 09:17
  • I do not know ahead of time what strings I'll receive - essentially this is a script that processes shipping labels for shipments to various countries. The input strings are based on user inputs and may use various country-specific characters. I'm therefore looking for something that takes many such special characters and translates them to the nearest utf-8 equivalent character – mfcss Apr 13 '21 at 12:01

0 Answers0