I have a csv file with some cells that have dollar signs (e.g. $46.5
). I am forcing all the types to be numpy.float64
in the function pandas.read_csv()
. It complains about ValueError: could not convert string to float: $46.5
. Is there a way to handle this cleanly?
Asked
Active
Viewed 6,008 times
5

nwly
- 1,360
- 4
- 16
- 30
-
1You'd have to strip and cast as a post-processing step or pass a method to `converters` param for `read_csv` – EdChum Mar 30 '16 at 21:47
-
Are you trying to use floating point numbers for currency? You probably should be using `decimal`: http://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency – Alex Taylor Mar 30 '16 at 21:53
1 Answers
12
You can add a converter for the relevant column(s):
pd.DataFrame({'col1': ['$46.51', '$38.00', 40],
'col2': [1, 2, 3]}).to_csv('test_df.csv', index=False)
>>> pd.read_csv('test_df.csv', converters={'col1': lambda s: float(s.replace('$', ''))})
col1 col2
0 46.51 1
1 38.00 2
2 40.00 3

Alexander
- 105,104
- 32
- 201
- 196
-
4You'll probably need to add the same logic for commas, or it will bomb trying to replace $1,437.22 – flyingmeatball Mar 31 '16 at 13:56
-
2