2

I have an Excel file like this :

enter image description here

When I try to read it as a dataframe and then convert it in a dict...

df = pd.read_excel(r"C:\Users\crd\Downloads\Classeur1.xlsx", sheet_name = "Feuil1", encoding="Latin-1")

print(df)

... I have this output :

  Label    Value1    Value2
0     C  -0.50000   1.80000
1     A  92.92768  92.92768

But when I would like to convert it into a dict, I have this output :

[
    {'Label': 'C', 'Value1': -0.5, 'Value2': 1.7999999999999998}, 
    {'Label': 'A', 'Value1': 92.92768, 'Value2': 92.92768}
]

Why does 1.8 becomes 1.7999999999999 after the dataframe being converted to dict and not when being created ?

Charles R
  • 1,621
  • 1
  • 8
  • 25
  • It helps wery much but I don't explain why the issue occurs after the convertion into a dict and not after the dataframe instantiation ? – Charles R Oct 30 '19 at 07:33
  • Printing a floating point value generally involves some rounding. It looks like printing the dataframe rounds to fewer decimal places. – Mark Ransom Oct 30 '19 at 12:30

1 Answers1

1

In python (and any programming language) real numbers are represented using a discrete number of bits of memory in the computer. This means that not all real numbers can be exactly represented past a given precision. Python has a thorough explanation of how this works in the documentation.

There are options for exactly representing a set of decimal numbers with fixed precision. For example you could use the built in Decimal class to represent your floating point numbers after reading them in from Excel. This could give you an exact representation of your data.

If you search "floating point representation" on SO you will find lots of interesting questions illustrating how this works. For example the accepted answer to this question explains how to view and interpret the exact binary representation of a float in Python.

A final thought on this question is that Excel is extremely good at displaying a value in a spreadsheet that is different from the underlying representation it has stored in the computer memory. There are all sorts of options for changing what is displayed. So I often like to write my Excel data to a .csv or other text file format and then read the text file into Python. The values in a text file will be a better reference point for comparing with the representation in Python.

lunguini
  • 896
  • 1
  • 9
  • 14
  • It helps wery much but I don't explain why the issue occurs after the convertion into a dict and not after the dataframe instantiation ? – Charles R Oct 30 '19 at 07:33
  • pandas is only displaying 5 significant digits when you call print. the actual representation of the floating point is unchanged but Pandas has rounded to display to you. to test this you can try `df.values[0,2].hex() == df_dict[0]['Value2'].hex()`. using `.hex()` you can see how the floating point numbers are actually stored in memory and check that they are the same – lunguini Oct 30 '19 at 17:37