Try the following test:
Create the following DataFrame, using read_csv, but from a text buffer:
txt = '''c1,c2,c3
Xxxxx,4.2515014131285567e-001,4.2515014131285555e-001
Yyyyy,4.2515014131284444e-001,4.2515014131283333e-001
Zzzzz,4.2515014131282222e-001,4.2515014131281111e-001'''
df = pd.read_csv(pd.compat.StringIO(txt))
Then check types of columns with df.info()
.
For both c2 and c3 columns you should receive float64 type.
If you execute df.c2 * 2
, you should receive doubled values.
Don't bother about smaller number of decimal digits.
It is the matter of Pandas options.
You can display an individual number with almost full precision, using df.loc[0, 'c2']
(I got 0.4251501413128557
).
The same results should be even if numbers were surrounded with e.g. double quotes.
Up to now it was OK, but now try the second test:
In row 3, c2 column, remove e in front of -001, so this value is now
4.2515014131282222-001 and read_csv again.
The value changed is not any properly formatted float, so read_csv
assumes for c2 column object type, actually a string (you can confirm it
with df.info()
).
My assumption is that somewhere in your text file the format of a number
is somehow "corrupted" and just this prevents read_csv from reading
this column as float.
To find the place - source of this error, run:
df.c2 = pd.to_numeric(df.c2, errors='coerce')
(replacing c2 with the proper column name) and then look in this column
for NaN values.
Then look at the corresponding row in the input file and correct the error.
Alternative: df.dropna(inplace=True)
removes each row containig NaN in any
column. You may also add subset=['column_name'] parameter, to drop rows
with NaN in just this one column.