Multiple char removal
You may use a regex to perform multiple character replacement.
The construct you are interested in can be a character class or a grouping with alternation.
Character classes are [...]
with characters, character ranges or shorthand character classes inside them, and alternation groups are (...|....|.....)
like patterns. There may be a problem with using literal chars in both constructs, but re.escape
comes to rescue: it will make sure the chars you pass to the regex are treated as literal chars.
See a Python 3 demo:
>>> import re
>>> charsToRemove = ["$", ".", "€"]
>>> s='23.889,45 €'
>>> print(re.sub("|".join([re.escape(x) for x in charsToRemove]), "", s)) # Alternation group
23889,45
>>> print(re.sub(r"[{}]+".format("".join([re.escape(x) for x in charsToRemove])), "", s)) # Character class
23889,45
In Pandas, you'd use
df['col'].str.replace(r"[{}]+".format("".join([re.escape(x) for x in charsToRemove])),"", regex=True, inplace=True)
Note that the character class approach ([...]+
) will work faster.
Multiple replacements
You may consider creating a dictionary of replacements and then use it with Pandas replace
:
>>> from pandas import DataFrame
>>> import pandas as pd
>>> import regex
>>> repl_list = {'€':'$', ',':'.', r'\.': ''}
>>> col_list = ['23.889,45 €']
>>> frame = pd.DataFrame(col_list, columns=['col'])
>>> frame['col'].replace(repl_list, regex=True, inplace=True)
>>> frame['col']
0 23889.45 $
To make it work, you must use regex=True
argument and add import re
as all the keys in repl_list
are regular expressions. Do not forget to escape special regex chars in there. See What special characters must be escaped in regular expressions? Or, you may write r'\.'
as re.escape('.')
.