4

I've ran into a weird issue here. I have a dataframe df like below:

In [1561]: df
Out[1561]: 
      A     B
0  16.3  1.10
1  23.2  1.33
2  10.7 -0.43
3   5.7 -2.01
4   5.4 -1.86
5  23.5  3.14

I'm comparing every two adjacent rows of column A and storing the difference in a new column:

In [1562]: df['new_diff'] = (df.A - df.A.shift(-1)).fillna(0)
In [1563]: df
Out[1563]: 
      A     B  new_diff
0  16.3  1.10      -6.9
1  23.2  1.33      12.5
2  10.7 -0.43       5.0
3   5.7 -2.01       0.3
4   5.4 -1.86     -18.1
5  23.5  3.14       0.0

When I do a check to find out rows where new_diff is 5.0, I get an empty dataframe. But, it works fine when I do a check on < 5.0 or > 5.0. See below:

In [1567]: df[df['new_diff'] == 5.0]
Out[1567]: 
Empty DataFrame
Columns: [A, B, new_diff]
Index: []

In [1568]: df[df['new_diff'] > 5.0]
Out[1568]: 
      A     B  new_diff 
1  23.2  1.33      12.5  

In [1569]: df[df['new_diff'] < 5.0]
Out[1569]: 
      A     B  new_diff
0  16.3  1.10      -6.9
2  10.7 -0.43       5.0
3   5.7 -2.01       0.3
4   5.4 -1.86     -18.1
5  23.5  3.14       0.0

Please let me know what am I missing here?

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58

3 Answers3

5

Problem is with float precision, need numpy.isclose:

print (df['new_diff'].tolist())
[-6.899999999999999, 12.5, 4.999999999999999, 0.2999999999999998, -18.1, 0.0]

print (df[np.isclose(df['new_diff'], 5)])
      A     B  new_diff
2  10.7 -0.43       5.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Or, make the string type the 'new_diff' in the filter process (doesn't modify the actual data), then check if it equals to '5.0':

print(df[df['new_diff'].astype(str)=='5.0'])

Output:

      A     B  new_diff
2  10.7 -0.43       5.0
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
1

Just call round i.e

df[df['new_diff'].round() == 5.0]

      A     B  new_diff
2  10.7 -0.43       5.0
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108