0

I am pulling in data from a .csv file to a pandas dataframe. I have already tried fixing the source data. There is nothing I can do there.

I did read.csv and a few columns are coming through as objects. I tried the below strategy but I am getting errors. Can somebody please help me solve this problem? I need get the strings to numbers. Thank you

target_cols =  ['A','B','C']
df_full = pd.read_csv('file.csv')
for col in target_cols:
    df_full[col]  = df_full[col].astype(float)

I am getting errors. The output is

df_full[col]  = df_full[col].astype(float)

File "C:\Users\ED397JT\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5815, in astype new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)

File "C:\Users\ED397JT\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 418, in astype return self.apply("astype", dtype=dtype, copy=copy, errors=errors)

File "C:\Users\ED397JT\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 327, in apply applied = getattr(b, f)(**kwargs)

File "C:\Users\ED397JT\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 592, in astype new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)

File "C:\Users\ED397JT\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1309, in astype_array_safe new_values = astype_array(values, dtype, copy=copy)

File "C:\Users\ED397JT\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1257, in astype_array values = astype_nansafe(values, dtype, copy=copy)

File "C:\Users\ED397JT\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1201, in astype_nansafe return arr.astype(dtype, copy=True)

ValueError: could not convert string to float: ' 4,992,227,200 '

N27
  • 31
  • 5
  • 1
    It appears you have comma separated values in your column (`' 4,992,227,200 '`). You can add the thousands separator to read_csv like `df.read_csv('file.tsv', thousands=',')` [like this answer](https://stackoverflow.com/a/22137890/15497888). Or you need to remove the commas before converting: `df_full[col] = df_full[col].str.replace(',', '').astype(float)` [like this answer](https://stackoverflow.com/a/57106775/15497888) – Henry Ecker Oct 11 '21 at 18:28
  • An addition to @HenryEcker's comment: when using `df_full[col] = df_full[col].str.replace(',', '').astype(float)` and there is any column _not_ containing the thousands separator, you'll get another error: `AttributeError: Can only use .str accessor with string values!`, because the respective column will be interpreted as containing floats (or integers) and thus, `.str` cannot be used. You may use `df_full[col] = df_full[col].astype(str).str.replace(',', '').astype(float)` to include both cases. – Lenka Čížková Oct 11 '21 at 18:48

0 Answers0