2

The Problem

I'm attempting to search through a pandas dataframe to find a single value. The dataframe columns I'm searching through are of type float64.

Working Example

Here is a working example of what I'd like, with a dataframe of type int64.

myseries = pd.Series([1,4,0,7,5], index=[0,1,2,3,4])
myseries

The output is the following:

0    1
1    4
2    0
3    7
4    5
dtype: int64

Now for the search:

myseries == 4

Results:

0    False
1     True
2    False
3    False
4    False
dtype: bool

Not Working Example

Here is a sample of my data.

df['difference']

Result

0    -2.979296
1    -0.423903
2     0.396515
...
48    0.450493
49   -1.216324
Name: priceDiff1, dtype: float64

As you can see, it is of type float64. Now here's the issue. If I copy the value on row 2, and create a conditional statement like before, it doesn't return the True.

df['difference'] == 0.396515

Output

0     False
1     False
2     False
...
48    False
49    False
Name: priceDiff1, dtype: bool

Row 2 should be True. Any assistance at this issue with this issue would be great. What I believe is happening, is that my query isn't setting the type to float64 and might be assuming it's a different type. I've tested this by downcasting the column type from float64 to float32, with no luck.

  • Is this a "comparison of floats" issue? e.g https://stackoverflow.com/questions/5595425/what-is-the-best-way-to-compare-floats-for-almost-equality-in-python – DavidG Feb 16 '18 at 00:18
  • Attempted the math.isclose() function and it didn't work either... Maybe the data has more trailing points than is being displayed? I'm lost. – Carlos Santana Feb 16 '18 at 00:29
  • df['difference'][2] 0.396515231 Okay, so I was right. When I show the dataframe, it's only showing less sig points than are actually there. – Carlos Santana Feb 16 '18 at 00:31

2 Answers2

6

You want to use Numpy's isclose

np.isclose(s, 0.396515)

array([False, False,  True, False, False, False], dtype=bool)
piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

Your python series stores, or points to, numeric data represented as floats, not decimals.

Here is a trivial example:-

import pandas as pd

s = pd.Series([1/3, 1/7, 2, 1/11, 1/3])

# 0    0.333333
# 1    0.142857
# 2    2.000000
# 3    0.090909
# 4    0.333333
# dtype: float64

s.iloc[0] == 0.333333  # False
s.iloc[0] == 1/3       # True

As @piRSquared explains, use np.isclose for such comparisons. Or, alternatively, round your data to a fixed number of decimal places.

jpp
  • 159,742
  • 34
  • 281
  • 339