0

I have a following df:

import pandas as pd

df = pd.DataFrame({"name" : ["a", "b", "c"], "value" : ['1\xa0412', 4, 2]})

I would like to replace '1\xa0412' with 1. I try this:

df['value'] = df['value'].str.replace(r'\\.*', '', regex=True)

But it does not work. How can I solve it, please?

vojtam
  • 1,157
  • 9
  • 34

2 Answers2

1

try:

df.value = df.value.apply(repr).str.replace(r"(\\.*)|\'", r"", regex=True)

result:

    name    value
0   a       1
1   b       4
2   c       2

but be careful because the column value is of type object. If you want another dtype you have to convert the column.

99_m4n
  • 1,239
  • 3
  • 17
0

Try using the unidecode library to process the data first, and then try to replace it. It worked for me for a similar problem.

sebastian
  • 101
  • 1
  • 4
  • Like this? https://stackoverflow.com/questions/44539421/pandas-apply-unidecode-to-several-columns Does not work to me – vojtam Aug 02 '22 at 10:08
  • Yes, but apply it to only those columns with the strange encoding. If it does not work - what is the error? – sebastian Aug 02 '22 at 10:13
  • no error, but nothing changes. Try my reproducible example, please – vojtam Aug 02 '22 at 10:24
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 04 '22 at 03:59