I've stumbled upon a small issue when using pandas DataFrame:
I have a big csv file (around 2Gb of data) containing the price of an asset and created using the DataFrame.to_csv()
function of Pandas, and when I take a closer inspection of the code, my first lines look like this:
DateTime,open,high,low,close
2016-01-04 00:36:18,1.08505,1.08505,1.08504,1.08504
2016-01-04 00:36:19,1.08505,1.08505,1.08504,1.08504
2016-01-04 00:36:20,1.08503,1.08503,1.08495,1.08495
2016-01-04 00:36:21,1.0849600000000001,1.0849600000000001,1.0849600000000001,1.0849600000000001
2016-01-04 00:36:22,1.0849600000000001,1.0849600000000001,1.08492,1.08492
The data was created using the df.resample('1s').ohlc()
and I thought that sometimes there was a few rounding issue, so I tried to round the DataFrame using df.round(5)
to keep the last 5 decimals, but it doesn't change anything at all.
SEC = pd.read_csv("D:\Finance python\Data\EUR_USD\Sec\S1_2015.csv",index_col='DateTime',parse_dates=True,error_bad_lines=False,infer_datetime_format=True)
SEC = SEC.round(5)
The DataFrame stays the same, and I truly wonder why.
When I try it with a csv file containing the 5 rows I gave above:
In[13]: SEC["open"][3]
Out[13]: 1.0849599999999999
It's not an issue when doing calculation over the df (even though it might be faster the less decimals there is), but it seems like a lot of 0 or 9 are being stored in my csv files for nothing, and are taking extra space.
It also seems that even value that look fine in the csv file, are actually not well rounded when called with pandas.
Would anyone have an idea of why the DataFrame are not being rounded properly, or of a solution to have shorter csv files when I save them with pandas?
Thanks in advance
Edit: I tried to use the Decimal method, but it still doesn't work. I believe that it is because pandas is not able to store Decimal type numbers in dataframes, thus converting it to a float.