Floating point comparisons don't yield the expected output in pandas

Question

I got the following numpy array that I converted to dataframe:

data =np.array([('210', 0.5316666570181647, 0.99102639737063),
                ('215', 0.5541666565152505, 0.9906073078204338),
                ('220', 0.5658333229211469, 0.9905192216775841),
                ('225', 0.6158333218035598, 0.9893290692391012),
                ('230', 0.10499999988824131, 0.9999143781512333),
                ('235', 0.061666665288309254, 0.9999999088637485),
                ('240', 0.061666665288309254, 0.9999999088637485),
                ('245', 0.061666665288309254, 0.9999999088637485)], 
                dtype=[('index', '|O'), ('time', '<f8'), ('min_value', 
                '<f8')])

df = pd.DataFrame(data)

Now I need to get the rows that only have min_values less than 1.0 I tried the following but it didn't work!

minf[minf.min_value < 1]

@Georgy and Azat Ibrakov, this is not a typo. if anything this is due to a lack of understanding of floating point math. Please read the question correctly before exercising a close vote, and fyi (not casting aspersions, just fyi) this question does not warrant a downvote on the answer either, thanks. — cs95, Apr 26 '20 at 22:41

cs95 · Accepted Answer · 2017-08-22T11:10:05.823

1

Taking a look at your data, it becomes clear that the cause of confusion is the way the floats are being displayed. Every value in min_value is under 1, but when being displayed, some of those values are rounded:

In [1131]: df
Out[1131]: 
  index      time  min_value
0   210  0.531667   0.991026
1   215  0.554167   0.990607
2   220  0.565833   0.990519
3   225  0.615833   0.989329
4   230  0.105000   0.999914
5   235  0.061667   1.000000
6   240  0.061667   1.000000
7   245  0.061667   1.000000

But df.min_value < 1 registers as all of them being under 1, since you're dealing with the actual values, not with what's being printed.

In [1133]: df.min_value < 1
Out[1133]: 
0    True
1    True
2    True
3    True
4    True
5    True
6    True
7    True
Name: min_value, dtype: bool

As a solution, consider applying a rounding to the numbers. You can then filter those numbers out. For example, you may use np.around and round to 5 decimal places:

In [1136]: df[np.around(df.min_value, 5) < 1]
Out[1136]: 
  index      time  min_value
0   210  0.531667   0.991026
1   215  0.554167   0.990607
2   220  0.565833   0.990519
3   225  0.615833   0.989329
4   230  0.105000   0.999914

With this, a filter is applied upon the rounded data, but no changes/modifications are made to the actual data.

edited Aug 22 '17 at 11:10

answered Aug 22 '17 at 10:45

cs95

379,657
97
704
746

sorry that was wrong. I edit the post again ( minf[minf.min_value < 1] ) – Samir Alhejaj Aug 22 '17 at 10:50
@SamirAlhejaj Everything in your data frame is less than 1. What do you expect? – cs95 Aug 22 '17 at 10:53
Query method does not work either it displays the rows with 1.0 values! – Samir Alhejaj Aug 22 '17 at 10:56
@SamirAlhejaj What you need to understand is that everything in your dataframe is lesser than 1. For example, `0.9999999088637485` is rounded to `1.0` when displaying, but it is still smaller than 1. – cs95 Aug 22 '17 at 10:57
Thanks, so how can I display the figures in their real values (not approximate) and eliminate the rows with the maximum values. This was my equation. – Samir Alhejaj Aug 22 '17 at 10:59

Floating point comparisons don't yield the expected output in pandas

1 Answers1