0

Context: I'm trying to convert characters like these:

To normal python strings (speedy, building, tuesday, etc) and save them into a new dataframe to be exported into a new excel file. For example, the charcter (U+1D552) should be converted to a (U+00AA). I'm reading each string from an excel file using read_excel. Should I do some type of encoding = "utf-8"? on the read_excel function? Or is there a way using re to replace those characters? Or even encode("ascii").decode(utf-8)?

Thank you in advance

Chronicles
  • 436
  • 1
  • 11

1 Answers1

3

Using unicodedata you can normalize unicode strings:

>> from unicodedata import normalize
>> test_str = "   "
>> print(normalize('NFKC', test_str))
BUILDING Speedy TUESDAY spaghetti
Adid
  • 1,504
  • 3
  • 13
  • Hi, thank you! It does :) The word "" still returns 2 weird characters ( and ), but Ill have a look a the package and figure something out with it :) Thanks – Chronicles May 25 '22 at 13:45