I have 2 csv files with same columns but different data. After importing them to pandas, the dtype of column cost
in one of them is float
, but object
in another.
I found a similar question, in this case the answer is "this was a bug in <=0.12 (but is fixed in 0.13)" according to Andy Hayden.
But the questions is: both of my csv files have a similar min number, 1.000000e-02
neither blank value
.
(I'm using Python 3.7, Pandas 0.23.4 on PyCharm2018.2)
# csv 1: before pd.to_numeric
count 174526
unique 84873
top 0.41
freq 505
Name: cost, dtype: object
# csv 1: after pd.to_numeric
count 1.745260e+05
mean 3.608746e+04
std 4.690326e+05
min 1.000000e-02
25% 1.040000e+01
50% 1.190400e+02
75% 1.433350e+03
max 5.400000e+07
Name: cost, dtype: float64
# csv 2:
count 2.578860e+05
mean 1.588632e+04
std 3.295925e+05
min 1.000000e-02
25% 2.820000e+00
50% 2.109000e+01
75% 2.426200e+02
max 6.030000e+07
Name: cost, dtype: float64
In another point of view, if I break my code into 2 parts, everything is fine for csv2:
df = pd.read_csv('file_name.csv',low_memory=False)
df = df[df.Cloumn1 != 'Value1']
df['cost_T'] = df['cost'] / 1000
df.to_csv('new_file_name.csv', index=False)
"""
TypeError: unsupported operand type(s) for /: 'str' and 'int'
"""
df = pd.read_csv('file_name.csv',low_memory=False)
df = df[df.Cloumn1 != 'Value1']
df.to_csv('new_file_name.csv', index=False)
df = pd.read_csv('new_file_name.csv', low_memory=False)
df['cost_T'] = df['cost'] / 1000
df.to_csv('final_file_name.csv', index=False)
"""
everything is fine.
"""
If someone has any idea, please let me know.