"ASCII-ish" approximation of some Unicode characters in Python

Question

For reasons I don't control, I have to convert (English) text with some Unicode characters to ASCII (for further processing elsewhere). For example:

Deutsche Börse

When I do:

u'Deutsche Börse'.encode(encoding='ascii', errors='replace')

I get

b'Deutsche B?rse'

Which is not exactly what I need. Ideally I would like to get Deutsche Borse.

I realize of course that vast majority of Unicode characters that's not possible. But for many important names, like say Quebec, that's possible in principle.

How can I do that?

a German would say that it should be converted to "Deutsche Boerse". If you just need to remove the diacritics then it's a duplicate of [What is the best way to remove accents in a Python unicode string?](https://stackoverflow.com/q/517923/995714) — phuclv, Apr 04 '18 at 13:48
Possible duplicate of [What is the best way to remove accents in a Python unicode string?](https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string) — phuclv, Apr 05 '18 at 16:59
I don't have enough reputation to comment, but I just googled it and found a similar question: https://stackoverflow.com/questions/8087381/approximately-converting-unicode-string-to-ascii-string-in-python — Sergey Shayderov, Apr 04 '18 at 13:49

score 2 · Answer 1 · answered Apr 04 '18 at 13:48

2

Use the unicodedata module.

Ex:

import unicodedata
s = u'Deutsche Börse'
print unicodedata.normalize('NFKD', s).encode('ascii', 'ignore')

Output:

Deutsche Borse

answered Apr 04 '18 at 13:48

Rakesh

81,458
17
76
113

score 2 · Accepted Answer · answered Apr 04 '18 at 13:50

2

Here is what you need : For converting to ASCII you might want to try unicodedata

import unicodedata

data= u'Deutsche Börse'

print (unicodedata.normalize('NFKD', data).encode('ascii','ignore'))

Output

 b'Deutsche Borse'

answered Apr 04 '18 at 13:50

toheedNiaz

1,435
1
10
14

A good answer, but, unfortunately, due to working on an answer in parallel unbeknownst to each other, an effective duplicate of Rakesh's (who happened to press `Post Your Answer` a mere minute or so earlier). – mklement0 Apr 04 '18 at 13:59

"ASCII-ish" approximation of some Unicode characters in Python

2 Answers2