I have some base64 encoded text fields in some XML data.
To get all the characters showing correctly, I think I need to find an additional encoding used on this text, which is not UTF-8 by the look of it. ?And maybe some other encoding aspect too, not sure..
I am not sure what order I should be encoding and decoding here - following https://www.geeksforgeeks.org/encoding-and-decoding-base64-strings-in-python/ I tried to first:
- Encode the whole string with every possible Python2.7 encoding, then
- decode with base64
(same result each time, no standard representation of problem characters)
Then I tried:
- encode string with utf8
- decode with base64
- decode the bytes string with every possible Python2.7 encoding
However, none of these answer strings seem to get any standard representation of the problem characters, which should display as 'é' and 'ü'.
I enclose this example string, where I am sure what the final correct text should be.
Original base64 string: b64_encoded_bytes = 'R3KfbmRlciBGco5kjnJpYyBKb3Vzc2V0JiMxMzsmIzEzO3NlbGVjdGlvbiBjb21taXR0ZWUgZm9yIGFydGlzdCByZWNpZGVuY3k6IFZpbmNpYW5jZSBEZXNwcmV0LCBLb3lvIEtvdW9oLCBDaHJpc3RpbmUgbWFjZWwsIEhhbnMtVWxyaWNoIE9icmlzdCwgTmF0YT9hIFBldHJlP2luLUJhY2hlbGV6LCBQaGlsaXBwZSBWZXJnbmU='
Text string with correct 'é' and 'ü' characters at beginning, deduced from European language knowledge:
'Gründer Frédéric Jousset selection committee for artist recidency: Vinciance Despret, Koyo Kouoh, Christine macel, Hans-Ulrich Obrist, Nata?a Petre?in-Bachelez, Philippe Vergne'
Note the ' ' is HTML encoding of apparently new line character used in Windows, and '?' might also resolve to another correct character with correct encoding, or possibly '?' is actual display in original data.