0

I am trying to clean my data which contains by making the columns[1:] float types.

for col_i in new_col_titles[1:]:
    df[col_i] = df[col_i].astype(float)

However, I get the following error.

ValueError: could not convert string to float: '\xa0$ 25,507,036'

I have tried to use df = df.replace('\x0$','') but I have so far had no luck

JPWilson
  • 691
  • 4
  • 14
  • 1
    Can you try `df[col_i] = df[col_i].str.replace('\x0$', '').astype(float)`? – Marat Aug 30 '20 at 19:06
  • SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-2: truncated \xXX escape – JPWilson Aug 30 '20 at 19:10
  • I missed `a` in `\xa0$` – Marat Aug 30 '20 at 19:30
  • still returns \xa0$ 25,507,036 when I did \xao it did this error though "could not convert string to float: '$ 25,507,036'" – JPWilson Aug 30 '20 at 19:36
  • I managed to remove the \xao but it now shows this $ 25507036', '$ 21550568', '$ 21576850' when i do print (df.astype(str).values.tolist()) But I cant seem to use replace to remove the dollar sign – JPWilson Aug 30 '20 at 20:19

1 Answers1

0

I solved it in a multistep process, Im sure someone can clean this up a little but for anyone else who is stuck by this:

   for col_i in new_col_titles:
        df[col_i] = df[col_i].astype(str)
        df[col_i] = df[col_i].str.replace('\xa0', '', regex=True)
        df[col_i] = df[col_i].str.replace('$', '', regex=True)
JPWilson
  • 691
  • 4
  • 14