Convert "weird" strings to normal python strings

Question

Context: I'm trying to convert characters like these:

To normal python strings (speedy, building, tuesday, etc) and save them into a new dataframe to be exported into a new excel file. For example, the charcter (U+1D552) should be converted to a (U+00AA). I'm reading each string from an excel file using read_excel. Should I do some type of encoding = "utf-8"? on the read_excel function? Or is there a way using re to replace those characters? Or even encode("ascii").decode(utf-8)?

Thank you in advance

You want NFKC normalization: `unicodedata.normalize('NFKC', ' ')` returns `'BUILDING Speedy TUESDAY spaghetti'` — smitop, May 25 '22 at 13:39

score 3 · Accepted Answer · answered May 25 '22 at 13:39

3

Using unicodedata you can normalize unicode strings:

>> from unicodedata import normalize
>> test_str = "   "
>> print(normalize('NFKC', test_str))
BUILDING Speedy TUESDAY spaghetti

answered May 25 '22 at 13:39

Adid

1,504
3
13

Hi, thank you! It does :) The word "" still returns 2 weird characters ( and ), but Ill have a look a the package and figure something out with it :) Thanks – Chronicles May 25 '22 at 13:45

Convert "weird" strings to normal python strings

1 Answers1