2

For reasons I don't control, I have to convert (English) text with some Unicode characters to ASCII (for further processing elsewhere). For example:

Deutsche Börse 

When I do:

u'Deutsche Börse'.encode(encoding='ascii', errors='replace')

I get

b'Deutsche B?rse'

Which is not exactly what I need. Ideally I would like to get Deutsche Borse.

I realize of course that vast majority of Unicode characters that's not possible. But for many important names, like say Quebec, that's possible in principle.

How can I do that?

mklement0
  • 382,024
  • 64
  • 607
  • 775
LetMeSOThat4U
  • 6,470
  • 10
  • 53
  • 93
  • 1
    a German would say that it should be converted to "Deutsche Boerse". If you just need to remove the diacritics then it's a duplicate of [What is the best way to remove accents in a Python unicode string?](https://stackoverflow.com/q/517923/995714) – phuclv Apr 04 '18 at 13:48
  • Possible duplicate of [What is the best way to remove accents in a Python unicode string?](https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string) – phuclv Apr 05 '18 at 16:59
  • I don't have enough reputation to comment, but I just googled it and found a similar question: https://stackoverflow.com/questions/8087381/approximately-converting-unicode-string-to-ascii-string-in-python – Sergey Shayderov Apr 04 '18 at 13:49

2 Answers2

2

Use the unicodedata module.

Ex:

import unicodedata
s = u'Deutsche Börse'
print unicodedata.normalize('NFKD', s).encode('ascii', 'ignore')

Output:

Deutsche Borse
Rakesh
  • 81,458
  • 17
  • 76
  • 113
2

Here is what you need : For converting to ASCII you might want to try unicodedata

import unicodedata

data= u'Deutsche Börse'

print (unicodedata.normalize('NFKD', data).encode('ascii','ignore'))

Output

 b'Deutsche Borse'
toheedNiaz
  • 1,435
  • 1
  • 10
  • 14
  • A good answer, but, unfortunately, due to working on an answer in parallel unbeknownst to each other, an effective duplicate of Rakesh's (who happened to press `Post Your Answer` a mere minute or so earlier). – mklement0 Apr 04 '18 at 13:59