I think you need add parameter skipinitialspace
in read_csv
:
skipinitialspace : boolean, default False, Skip spaces after delimiter
Test:
import pandas as pd
import numpy as np
import io
temp=u"""uid, f_1, f_2
1, "1", 1.19
2, "2", 2.3
3, "0", 4.8"""
print pd.read_csv(io.StringIO(temp))
uid f_1 f_2
0 1 "1" 1.19
1 2 "2" 2.30
2 3 "0" 4.80
#doesn't work dtype
print pd.read_csv(io.StringIO(temp), dtype= {'f_1': np.int64}).dtypes
uid int64
f_1 object
f_2 float64
dtype: object
print pd.read_csv(io.StringIO(temp), skipinitialspace=True).dtypes
uid int64
f_1 int64
f_2 float64
dtype: object
If you want remove first and last char "
from column f_1
use converters
:
import pandas as pd
import io
temp=u"""uid, f_1, f_2
1, "1", 1.19
2, "2", 2.3
3, "0", 4.8"""
print pd.read_csv(io.StringIO(temp))
uid f_1 f_2
0 1 "1" 1.19
1 2 "2" 2.30
2 3 "0" 4.80
#remove "
def converter(x):
return x.strip('"')
#define each column
converters={'f_1': converter}
df = pd.read_csv(io.StringIO(temp), skipinitialspace=True, converters = converters)
print df
uid f_1 f_2
0 1 1 1.19
1 2 2 2.30
2 3 0 4.80
print df.dtypes
uid int64
f_1 object
f_2 float64
dtype: object
If you need convert integer
column f_1
to string
use dtype
:
import pandas as pd
import io
temp=u"""uid, f_1, f_2
1, 1, 1.19
2, 2, 2.3
3, 0, 4.8"""
print pd.read_csv(io.StringIO(temp)).dtypes
uid int64
f_1 int64
f_2 float64
dtype: object
df = pd.read_csv(io.StringIO(temp), skipinitialspace=True, dtype = {'f_1' : str })
print df
uid f_1 f_2
0 1 1 1.19
1 2 2 2.30
2 3 0 4.80
print df.dtypes
uid int64
f_1 object
f_2 float64
dtype: object
Notice: Don't forget change io.StringIO(temp)
to a.csv
.
And explaining str
vs object
is here.