1

I am working on a football dataset and am dealing with some exotic names. I would like to ask how do I replace special ALPHABETS that are present in my dataset? These are some of examples of these "exotic" names:

'Lionel Andrés Messi Cuccittini', 'Neymar da Silva Santos Junior', 'Luis Alberto Suárez Díaz', 'David De Gea Quintana', 'Zlatan Ibrahimović'

The special alphabets are é, á, ć, etc (alphabets with a "superscript" above). I want to change them to the "base" form - ć becomes c, á becomes a, so on and so forth.

Many thanks in advance!

user12575866
  • 107
  • 1
  • 4
  • 2
    Does this answer your question? [What is the best way to remove accents in a Python unicode string?](https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string) – Alexander van Oostenrijk Dec 21 '19 at 13:57
  • 4
    If not absolutely necessary to replace, leave the names as they are. In all there languages the "special" characters are not equivalent to the similar "normal" characters. See the German city of Düsseldorf (village at the Düssel) is not Dusseldorf (village of stupids). – Klaus D. Dec 21 '19 at 14:15

3 Answers3

1

you could try this

for i in range(len(playernames)):
    playernames[i] = playernames[i].replace("é", "e")

and then of course add all the other characters

Axblert
  • 556
  • 4
  • 19
1

You can use unidecode package:

import unidecode
special_str = [u'Lionel Andrés Messi Cuccittini', u'Neymar da Silva Santos Junior', u'Luis Alberto Suárez Díaz', u'David De Gea Quintana', u'Zlatan Ibrahimović']
for item in special_str:
    print(unidecode.unidecode(item))

The output will be:

Lionel Andres Messi Cuccittini
Neymar da Silva Santos Junior
Luis Alberto Suarez Diaz
David De Gea Quintana
Zlatan Ibrahimovic
Bikramjeet Singh
  • 681
  • 1
  • 7
  • 22
0

You can try that:

import unidecode
new_string = unidecode.unidecode(your_string)
JoWyN
  • 21
  • 2