3

I have to convert all the latin characters to their corresponding English alphabets. Can I use Python to do it? Or is there a mapping available?

Unicode values to non-unicode characters

Ramírez Sánchez should be converted to Ramirez Sanchez.

Bart
  • 19,692
  • 7
  • 68
  • 77
AlgoMan
  • 2,785
  • 6
  • 34
  • 40
  • 1
    English alphabet *is* the latin alphabet. Can you be more specific? – Karl Bielefeldt Dec 22 '10 at 18:55
  • possible duplicate of [What is the best way to remove accents in a python unicode string?](http://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string) – Lennart Regebro Dec 22 '10 at 18:59

1 Answers1

11

It looks like what you want is accent removal. You can do this with:

def strip_accents(text):
    return ''.join(char for char in
                   unicodedata.normalize('NFKD', text)
                   if unicodedata.category(char) != 'Mn')

>>> strip_accents('áéíñóúü')
'aeinouu'
>>> strip_accents('Ramírez Sánchez')
'Ramirez Sanchez'

This works fine for Spanish, but note that it doesn't always work for other languages.

>>> strip_accents('ø')
'ø'
dan04
  • 87,747
  • 23
  • 163
  • 198