7

I have a list of floats (actually it's a pandas Series object, if it changes anything) which looks like this:

mySeries:

...
22      16.0
23      14.0
24      12.0
25      10.0
26       3.1
...

(So elements of this Series are on the right, indices on the left.) Then I'm trying to assign the elements from this Series as keys in a dictionary, and indices as values, like this:

{ mySeries[i]: i for i in mySeries.index }

and I'm getting pretty much what I wanted, except that...

{ 6400.0: 0, 66.0: 13, 3.1000000000000001: 23, 133.0: 10, ... }

Why has 3.1 suddenly changed into 3.1000000000000001? I guess this has something to do with the way the floating point numbers are represented (?) but why does it happen now and how do I avoid/fix it?

EDIT: Please feel free to suggest a better title for this question if it's inaccurate.

EDIT2: Ok, so it seems that it's the exact same number, just printed differently. Still, if I assign mySeries[26] as a dictionary key and then I try to run:

myDict[mySeries[26]]

I get KeyError. What's the best way to avoid it?

machaerus
  • 749
  • 1
  • 6
  • 20

2 Answers2

10

The dictionary isn't changing the floating point representation of 3.1, but it is actually displaying the full precision. Your print of mySeries[26] is truncating the precision and showing an approximation.

You can prove this:

pd.set_option('precision', 20)

Then view mySeries.

0    16.00000000000000000000
1    14.00000000000000000000
2    12.00000000000000000000
3    10.00000000000000000000
4     3.10000000000000008882
dtype: float64

EDIT:

What every computer programmer should know about floating point arithmetic is always a good read.

EDIT:

Regarding the KeyError, I was not able to replicate the problem.

>> x = pd.Series([16,14,12,10,3.1])
>> a = {x[i]: i for i in x.index}
>> a[x[4]]
4
>> a.keys()
[16.0, 10.0, 3.1000000000000001, 12.0, 14.0]
>> hash(x[4])
2093862195
>> hash(a.keys()[2])
2093862195
Logan Byers
  • 1,454
  • 12
  • 19
6

The value is already that way in the Series:

>>> x = pd.Series([16,14,12,10,3.1])
>>> x
0    16.0
1    14.0
2    12.0
3    10.0
4     3.1
dtype: float64
>>> x.iloc[4]
3.1000000000000001

This has to do with floating point precision:

>>> np.float64(3.1)
3.1000000000000001

See Floating point precision in Python array for more information about this.

Concerning the KeyError in your edit, I was not able to reproduce. See the below:

>>> d = {x[i]:i for i in x.index}
>>> d
{16.0: 0, 10.0: 3, 12.0: 2, 14.0: 1, 3.1000000000000001: 4}
>>> x[4]
3.1000000000000001
>>> d[x[4]]
4

My suspicion is that the KeyError is coming from the Series: what is mySeries[26] returning?

Community
  • 1
  • 1
brianpck
  • 8,084
  • 1
  • 22
  • 33