I am looking into rewriting some data analysis code using Pandas (since I just discovered it) on Ubuntu 14.04 64-bit and I have hit upon some strange behaviour. My data files look like this:
26/09/2014 00:00:00 2.423009 -58.864655 3.312355E-7 6.257226E-8 302 305
26/09/2014 00:00:00 2.395637 -62.73302 3.321525E-7 7.065322E-8 302 305
26/09/2014 00:00:01 2.332541 -57.763269 3.285718E-7 6.873837E-8 302 305
26/09/2014 00:00:02 2.366828 -51.900812 3.262279E-7 7.397762E-8 302 305
26/09/2014 00:00:03 2.435500 -40.820161 3.241068E-7 6.777224E-8 302 305
26/09/2014 00:00:04 2.428922 -65.573049 3.212358E-7 6.761804E-8 302 305
26/09/2014 00:00:05 2.419931 -59.414711 3.185517E-7 7.243236E-8 302 305
26/09/2014 00:00:06 2.416663 -60.064279 3.209795E-7 6.242328E-8 302 305
26/09/2014 00:00:07 2.411954 -52.586242 3.184297E-7 5.825581E-8 302 304
26/09/2014 00:00:08 2.457342 -61.874388 3.151493E-7 6.327384E-8 303 304
Where columns are tab-separated. In order to read these into Pandas, I am using the following simple commands:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv("path/to/file.dat", sep="\t", header=None)
print data
This produces the following output:
0 1 2 3 4 5 6 7
0 26/09/2014 00:00:00 2.423009 -58.864655 0 6.257226e-08 302 305
1 26/09/2014 00:00:00 2.395637 -62.733020 0 7.065322e-08 302 305
2 26/09/2014 00:00:01 2.332541 -57.763269 0 6.873837e-08 302 305
3 26/09/2014 00:00:02 2.366828 -51.900812 0 7.397762e-08 302 305
4 26/09/2014 00:00:03 2.435500 -40.820161 0 6.777224e-08 302 305
5 26/09/2014 00:00:04 2.428922 -65.573049 0 6.761804e-08 302 305
6 26/09/2014 00:00:05 2.419931 -59.414711 0 7.243236e-08 302 305
7 26/09/2014 00:00:06 2.416663 -60.064279 0 6.242328e-08 302 305
8 26/09/2014 00:00:07 2.411954 -52.586242 0 5.825581e-08 302 304
9 26/09/2014 00:00:08 2.457342 -61.874388 0 6.327384e-08 303 304
[10 rows x 8 columns]
The important thing to notice here is column 4. Compare it to column 5, and to the original data. Column 5 has been rendered in scientific notation, while column 4 has not. It hasn't just zeroed out the column or converted it to int because:
>>> data[4][0]*1e7
3.3123550000000002
Which is what I would expect. So the data values are the same but the representation has changed. If this is just a cosmetic thing, then I could put up with it, but it makes me feel uneasy and I'd like to know what's going on here.