I've noticed that when certain Numpy float64 values are saved as Excel file (via a Pandas DataFrame), they get changed. First I thought this has to do with some imprecision in Excel, but Excel seems to encode floating point numbers as double precision, so I am a bit confused about this observation.
>>> import numpy as np
>>> import pandas as pd
# Create a floating point number that exhibits the problem.
>>> ba = bytearray(['\x53', '\x2a', '\xb0', '\x49', '\xf3', '\x79', '\x90', '\x40'])
>>> ba
bytearray(b'S*\xb0I\xf3y\x90@')
>>> f = np.frombuffer(ba)
>>> f[0]
1054.4875857854684
# Write to dataframe to save as Excel file.
>>> df = pd.DataFrame({'a': f})
>>> df.to_excel('test.xlsx', engine='xlsxwriter')
# Read excel file (when viewing the file in LibreOffice, the
# value isn't 1054.4875857854684 any more).
>>> df2 = pd.read_excel('test.xlsx')
>>> df2.ix[0,'a']
1054.4875857854699
>>> df2.ix[0,'a'] == f[0]
False
Why is it not possible to read the same float64 back from Excel that was previously written?
I also tried this with Openpyxl
(.xlsx format) and Xlwt
(.xls format) as engines. While the former produced the same erroneous result as xlsxwriter
, Xlwt
was actually working as expected and wrote the float according to the exact variable value. Is there perhaps a parameter that I miss for the .xlsx
format writer engines?
# this uses the xlwt engine
>>> df.to_excel('test.xls')
>>> df2 = pd.read_excel('test.xls')
>>> df2.ix[0,'a'] == f[0]
True