How to remove strange encoding from pandas df

Question

I have a following df:

import pandas as pd

df = pd.DataFrame({"name" : ["a", "b", "c"], "value" : ['1\xa0412', 4, 2]})

I would like to replace '1\xa0412' with 1. I try this:

df['value'] = df['value'].str.replace(r'\\.*', '', regex=True)

But it does not work. How can I solve it, please?

score 1 · Accepted Answer · answered Aug 02 '22 at 12:43

try:

df.value = df.value.apply(repr).str.replace(r"(\\.*)|\'", r"", regex=True)

result:

    name    value
0   a       1
1   b       4
2   c       2

but be careful because the column value is of type object. If you want another dtype you have to convert the column.

sebastian · Answer 2 · 2022-08-02T10:05:12.053

0

Try using the unidecode library to process the data first, and then try to replace it. It worked for me for a similar problem.

edited Aug 02 '22 at 10:05

answered Aug 02 '22 at 10:04

sebastian

Like this? https://stackoverflow.com/questions/44539421/pandas-apply-unidecode-to-several-columns Does not work to me – vojtam Aug 02 '22 at 10:08
Yes, but apply it to only those columns with the strange encoding. If it does not work - what is the error? – sebastian Aug 02 '22 at 10:13
no error, but nothing changes. Try my reproducible example, please – vojtam Aug 02 '22 at 10:24
Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 04 '22 at 03:59

2 Answers2