-1

I have the following df:

df = pd.DataFrame({'refid':['Carne Pro Cárné 1.7 Kg','Chíckén Sopraval 1.9 Kg','Groúnded Beef Super Cérdo 1.0 Kg','Turkey Áriztía 1.2 kg','Wágyú Exportación 400 g'],
                   'Marca':['PRO CARNE','SOPRAVAL','SUPER CERDO','ARIZTIA','EXPORTACION'],
                   'Mult':[4.0,5.2,5.6,5.9,4.9]})

And I need to replace the spanish accents vocals (á,é,í,ó,ú) with (a,e,i,o,u) on the refid column.

I'm using this post solution: How to replace multiple substrings of a string?

My code:

rep = {'á':'a','é':'e','í':'i','ó':'o','ú':'u','Á':'A','É':'E','Í':'I','Ó':'O','Ú':'U'}
rep = dict((re.escape(k), v) for k, v in rep.items())
pattern = re.compile("|".join(rep.keys()))
df['refid'] = pattern.sub(lambda m: rep[re.escape(m.group(0))], str(df['refid']))

Result:

    refid                           Marca      Mult
0   0 Carne Pro Carne 1.7 Kg\n1 ... PRO CARNE   4.0
1   0 Carne Pro Carne 1.7 Kg\n1 ... SOPRAVAL    5.2
2   0 Carne Pro Carne 1.7 Kg\n1 ... SUPER CERDO 5.6
3   0 Carne Pro Carne 1.7 Kg\n1 ... ARIZTIA     5.9
4   0 Carne Pro Carne 1.7 Kg\n1 ... EXPORTACION 4.9

df['refid'][0]
'0              Carne Pro Carne 1.7 Kg\n1             Chicken Sopraval 1.9 Kg\n2    Grounded Beef Super Cerdo 1.0 Kg\n3               Turkey Ariztia 1.2 kg\n4             Wagyu Exportacion 400 g\nName: refid, dtype: object'

As you can see, instead of applying the function on a row by row basis it takes the entire df and replace each refid value with it.

Any help? Or maybe another method for doing this?

Thanks

imatiasmb
  • 113
  • 7

1 Answers1

0

You can pass the rep directly as the first argument to Series.replace while specifying regex=True as the second argument:

rep = {'á':'a','é':'e','í':'i','ó':'o','ú':'u','Á':'A','É':'E','Í':'I','Ó':'O','Ú':'U'}
df['refid'] = df['refid'].replace(rep, regex=True)

Output:

>>>  df['refid']
0              Carne Pro Carne 1.7 Kg
1             Chicken Sopraval 1.9 Kg
2    Grounded Beef Super Cerdo 1.0 Kg
3               Turkey Ariztia 1.2 kg
4             Wagyu Exportacion 400 g

If you want to normalize the string and convert every char to ASCII equivalent (including punctuation) you can install unidecode via pip install unidecode and then use

import unidecode
df['refid'] = df['refid'].apply(unidecode.unidecode)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563