I have the following df:
df = pd.DataFrame({'refid':['Carne Pro Cárné 1.7 Kg','Chíckén Sopraval 1.9 Kg','Groúnded Beef Super Cérdo 1.0 Kg','Turkey Áriztía 1.2 kg','Wágyú Exportación 400 g'],
'Marca':['PRO CARNE','SOPRAVAL','SUPER CERDO','ARIZTIA','EXPORTACION'],
'Mult':[4.0,5.2,5.6,5.9,4.9]})
And I need to replace the spanish accents vocals (á,é,í,ó,ú) with (a,e,i,o,u) on the refid
column.
I'm using this post solution: How to replace multiple substrings of a string?
My code:
rep = {'á':'a','é':'e','í':'i','ó':'o','ú':'u','Á':'A','É':'E','Í':'I','Ó':'O','Ú':'U'}
rep = dict((re.escape(k), v) for k, v in rep.items())
pattern = re.compile("|".join(rep.keys()))
df['refid'] = pattern.sub(lambda m: rep[re.escape(m.group(0))], str(df['refid']))
Result:
refid Marca Mult
0 0 Carne Pro Carne 1.7 Kg\n1 ... PRO CARNE 4.0
1 0 Carne Pro Carne 1.7 Kg\n1 ... SOPRAVAL 5.2
2 0 Carne Pro Carne 1.7 Kg\n1 ... SUPER CERDO 5.6
3 0 Carne Pro Carne 1.7 Kg\n1 ... ARIZTIA 5.9
4 0 Carne Pro Carne 1.7 Kg\n1 ... EXPORTACION 4.9
df['refid'][0]
'0 Carne Pro Carne 1.7 Kg\n1 Chicken Sopraval 1.9 Kg\n2 Grounded Beef Super Cerdo 1.0 Kg\n3 Turkey Ariztia 1.2 kg\n4 Wagyu Exportacion 400 g\nName: refid, dtype: object'
As you can see, instead of applying the function on a row by row basis it takes the entire df and replace each refid
value with it.
Any help? Or maybe another method for doing this?
Thanks