2

I am trying to manipulate a dataframe. The value of in a list which I use to append a column to the dataframe is 161137531201111100. However, I created a dictionary whose keys are the unique values of this column, and I use this dictionary in further operations. This could used to run perfectly before.

However, after trying this code on another data I had the following error:

KeyError: 1.611375312011111e+17

which means that this value is not the of the dictionary; I tried to trace the code, everything seemed to be okay. However, when I opened the csv file of the dataframe I built I found out that the value that is causing the problem is: 161137531201111000 which is not in the list(and ofc not a key in the dictionary) I used to create this column of dataframe. This seems weird. However, I don't know what is the reason? Is there any reason that a number is saved in another way?

And how can I save it as it is in all phases? Also, why did it change in the csv?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Mee
  • 1,413
  • 5
  • 24
  • 40
  • 1
    They are equal, yes. – tkausl Oct 26 '20 at 15:51
  • 1
    @tkausl in Python, no, they are not. `print(1.611375312011111e+17 == 161137531201111000)` shows "False". – Pac0 Oct 26 '20 at 15:52
  • 2
    Numbers with `.` are floats and inexact. Therefore comparing floats with other numbers may fail. Without `.` they are integers and exact (but can overflow under some conditions). – Michael Butscher Oct 26 '20 at 15:52
  • 1
    Since you want to use numbers as dict keys, I think the point here is to use dtype `int` consistently. If you bring `float` into the game somewhere, you could fall into the pitfalls of floating point arithmetic (comparison for equality...). – FObersteiner Oct 26 '20 at 16:02

1 Answers1

2

No unfortunately, they are not equal

print(1.611375312011111e+17 == 161137531201111000)` # False.

The problem lies in the way floating numbers are handled by computers, in general, and most programming languages, including Python.

Always use integers (and not "too large") when doing computations if you want exact results.

See Is floating point math broken? for generic explanation that you definitely must know as a programmer, even if it's not specific to Python.

(and be aware that Python tries to do a rather good job at keeping precision on integers, that unfortunately won't work on floating-point numbers).

And just for the sake of "fun" with floating point numbers, 1.611375312011111e+17 is actually equal to the integer 161137531201111104!

print(format (1.611375312011111e+17, ".60g"))      # shows 161137531201111104
print(1.611375312011111e+17 == 161137531201111104) # True

a = dict()
a[1.611375312011111e+17] = "hello"
#print(a[161137531201111100])       # Key error, as in question
print(a[161137531201111104])        # This one shows "hello" properly!
Pac0
  • 21,465
  • 8
  • 65
  • 74
  • different types *should not* compare equal, no? I mean, without taking into account their numerical value in this case. If *both* were of type float here, it would be the comparison for equality problem - if used as dict keys. – FObersteiner Oct 26 '20 at 15:59
  • 1
    @MrFuppes Comparison on numbers is done numerically in Python. I mean, the type is not compared as a strict requirement. For instance, `print(1.0 == 1)` prints "True". And it works for dict : `a = dict()` `a[1] = "hello"` `print(a[1.0])` : that properly prints "Hello". So, no, the problem is not the type, it's really that `1.611375312011111e+17` does not equal the integer number `161137531201111000`, du to the floating point numebrs' inherent imprecision. – Pac0 Oct 26 '20 at 16:03
  • ok thanks for the clarification - that seems like too much convenience to me ^^ wasn't aware of that (anymore), got more into strictly typed languages lately – FObersteiner Oct 26 '20 at 16:06
  • Yes, that is indeed potentially confusing. Python is less tedious, especially to learn programming, and is thus very popular... but when you scrap a bit too much you need to be aware of details. Thanks for the comment anyway, I also had a doubt and had to check this as well :P – Pac0 Oct 26 '20 at 16:08