0

I am removing accents and special characters from a DataFrame but the way I am doing it does not seem optimal to me, how can I improve it?

Thanks.

Code:

import pandas as pd

m = pd.read_excel('file.xlsx')

print(m)
m['hola']=m['hola'].str.replace(r"\W","")
m['hola']=m['hola'].str.replace(r"á","a")
m['hola']=m['hola'].str.replace(r"é","e")
m['hola']=m['hola'].str.replace(r"í","i")
m['hola']=m['hola'].str.replace(r"ó","o")
m['hola']=m['hola'].str.replace(r"ú","u")
m['hola']=m['hola'].str.replace(r"Á","A")
m['hola']=m['hola'].str.replace(r"É","E")
m['hola']=m['hola'].str.replace(r"Í","I")
m['hola']=m['hola'].str.replace(r"Ó","O")
m['hola']=m['hola'].str.replace(r"Ú","U")
print(m)
Breik
  • 11
  • 2
  • `.replace(r"\W","")` replaces the literal substring `\W`, note. You need to use `re.replace` to treat it as a regular expression. – Ry- Aug 08 '22 at 18:50

1 Answers1

1

You could make a dictionary with the special characters as the keys and their replacements as the values:

d = {}
d["á"] = "a".... etc.
x = "árwwwe"
for character in x:
    if character in d.keys():
        x = x.replace(character, d[character])
print(x)

Output:

arwwwe
Ryan
  • 1,081
  • 6
  • 14