I am parsing csv files and would like to remove non-ascii characters when they appear. Actually, I only need digits, but when I try to remove non-digit characters, I get an UnicodeEncodeError
.
I have the following function:
def remove_non_ascii(text):
return ''.join(re.findall("\d+", str(text)))
Also tried (just to remove non-ascii chars):
def remove_non_ascii(text):
return ''.join(i for i in str(text) if ord(i)<128)
When I print the result of the following, I get the correct result (for both functions)
print(remove_non_ascii('E-Mail Adresse des Empfängers'))
However, when I apply the function to the dataframe column df[col] = df[col].apply(remove_non_ascii)
, I get the UnicodeEncodeError
.
What am I doing wrong ?