Optimize the way to remove accents in python

Question

I am removing accents and special characters from a DataFrame but the way I am doing it does not seem optimal to me, how can I improve it?

Thanks.

Code:

import pandas as pd

m = pd.read_excel('file.xlsx')

print(m)
m['hola']=m['hola'].str.replace(r"\W","")
m['hola']=m['hola'].str.replace(r"á","a")
m['hola']=m['hola'].str.replace(r"é","e")
m['hola']=m['hola'].str.replace(r"í","i")
m['hola']=m['hola'].str.replace(r"ó","o")
m['hola']=m['hola'].str.replace(r"ú","u")
m['hola']=m['hola'].str.replace(r"Á","A")
m['hola']=m['hola'].str.replace(r"É","E")
m['hola']=m['hola'].str.replace(r"Í","I")
m['hola']=m['hola'].str.replace(r"Ó","O")
m['hola']=m['hola'].str.replace(r"Ú","U")
print(m)

`.replace(r"\W","")` replaces the literal substring `\W`, note. You need to use `re.replace` to treat it as a regular expression. — Ry-, Aug 08 '22 at 18:50

score 1 · Answer 1 · answered Aug 08 '22 at 18:43

1

You could make a dictionary with the special characters as the keys and their replacements as the values:

d = {}
d["á"] = "a".... etc.
x = "árwwwe"
for character in x:
    if character in d.keys():
        x = x.replace(character, d[character])
print(x)

Output:

arwwwe

answered Aug 08 '22 at 18:43

Ryan

1,081
6
14

dictionaries are probably the way to go since its a hash map implementation – Noah Aug 08 '22 at 18:52

Optimize the way to remove accents in python

1 Answers1