0

I'm trying to load in a xlsx file on python, but I am finding that some of the values are inaccurate. I am aware that using Decimal is more accurate, but I find that the raw value given from the dataset is inaccurate. This is the row from the xlsx file with the problem value highlighted. And this is the output when I pull that value from the dataframe. This is how I loaded the data

import pandas as pd
data = pd.read_excel(r'A:\Deakin\Crypto\Export Trade History-2021-07-02 10_40_25.xlsx')
df = pd.DataFrame(data)

And this is the value when I print the dataframe. I'm pretty flexible in how the file gets loaded and held in the program, i.e if there's something better than pandas to read and store in a data frame, I'm all ears

  • You're probably just seeing the normal floating point inaccuracies, see [Is floating point math broken?](https://stackoverflow.com/q/588004/5987) Excel applies some rounding when it displays a number. – Mark Ransom Jul 02 '21 at 02:58
  • I am using the same method to read excel files and hitting the same problem. In my case, the cell format is set to zero decimal places, but the underlying data has up to 4 decimal places. This actually makes no sense because the values represent 'thousands of bags of coffee', which implies that they are counting a fraction of a bag. – RustyB Jul 02 '21 at 03:31

1 Answers1

0

The problem here is about the difference of the actual value and the display of the value. The actual value stored in your excel is 17.9659xxx. However, excel only display it as 17.966, because it only shows 3 decimals due to some settings. You actually can change number of decimals to be displayed to see the actual value. And in the pandas DataFrame, when you do print(df), it actually displays the value with 3 decimals by default. When you do df['Amount'][633], it actually prints more decimals close to the actual float64 type value.

To put it short, the value you VIEW in excel is not the actual value stored. It has been rounded.

nngeek
  • 1,912
  • 16
  • 24
  • So would it be best if I were to round it before converting it to Decimal? – David Cuthbertson Jul 02 '21 at 04:16
  • @DavidCuthbertson , it depends on your data precision requirement. If you are OK to process data unto 3 or 4 decimal points, then you can just round it before processing. If you want very accurate result, then you better use the raw `float64` data type without any rounding. Only do the rounding in the final report or display. – nngeek Jul 02 '21 at 09:06
  • @DavidCuthbertson rounding *before* converting to Decimal won't help, because a `float` simply can't represent many numbers exactly. `decimal.Decimal(0.1)` for example returns `Decimal('0.1000000000000000055511151231257827021181583404541015625')` and you can't get a simpler case than that. – Mark Ransom Jul 02 '21 at 15:48