I am trying to create a dataframe in pandas using a CSV that is semicolon-delimited, and uses commas for the thousands separator on numeric data. Is there a way to read this in so that the type of the column is float and not string?
Asked
Active
Viewed 3.6k times
3 Answers
33
Pass param thousands=','
to read_csv
to read those values as thousands:
In [27]:
import pandas as pd
import io
t="""id;value
0;123,123
1;221,323,330
2;32,001"""
pd.read_csv(io.StringIO(t), thousands=r',', sep=';')
Out[27]:
id value
0 0 123123
1 1 221323330
2 2 32001

EdChum
- 376,765
- 198
- 813
- 562
-
1What does the `r` stand for, in the `thousands` field? – kotchwane Apr 26 '21 at 18:36
-
1@kotchwane the `r` makes it a [raw string literal](https://stackoverflow.com/q/2081640/13138364) (and is not actually necessary in this case) – tdy Jun 27 '22 at 03:09
10
The answer to this question should be short:
df=pd.read_csv('filename.csv', thousands=',')

Dimanjan
- 563
- 6
- 13
-
1with ; separator df=pd.read_csv('filename.csv', sep=";", thousands=',') – Armin Okić Jul 06 '23 at 06:42