0

I have a table in an HDFStore with a column of floats f stored as a data_column. I would like to select a subset of rows where, e.g., f==0.6.

I'm running in to trouble that I'm assuming is related to a floating-point precision mismatch somewhere. Here is an example:

In [1]: f = np.arange(0, 1, 0.1)

In [2]: s = f.astype('S')

In [3]: df = pd.DataFrame({'f': f, 's': s})

In [4]: df
Out[4]: 
     f    s
0  0.0  0.0
1  0.1  0.1
2  0.2  0.2
3  0.3  0.3
4  0.4  0.4
5  0.5  0.5
6  0.6  0.6
7  0.7  0.7
8  0.8  0.8
9  0.9  0.9

[10 rows x 2 columns]

In [5]: with pd.get_store('test.h5', mode='w') as store:
   ...:     store.append('df', df, data_columns=True)
   ...:     

In [6]: with pd.get_store('test.h5', mode='r') as store:
   ...:     selection = store.select('df', 'f=f')
   ...:     

In [7]: selection
Out[7]: 
     f    s
0  0.0  0.0
1  0.1  0.1
2  0.2  0.2
4  0.4  0.4
5  0.5  0.5
8  0.8  0.8
9  0.9  0.9

[7 rows x 2 columns]

I would like the query to return all of the rows but instead several are missing. A query with where='f=0.3' returns an empty table:

In [8]: with pd.get_store('test.h5', mode='r') as store:
    selection = store.select('df', 'f=0.3')
   ...:     

In [9]: selection
Out[9]: 
Empty DataFrame
Columns: [f, s]
Index: []

[0 rows x 2 columns]

I'm wondering whether this is the intended behavior, and if so is there is a simple workaround, such as setting a precision limit for floating-point queries in pandas? I'm using version 0.13.1:

In [10]: pd.__version__
Out[10]: '0.13.1-55-g7d3e41c'
mcwitt
  • 1,054
  • 1
  • 11
  • 10

1 Answers1

3

I don't think so, no. Pandas is built around numpy, and I have never seen any tools for approximate float equality except testing utilities like assert_allclose, and that won't help here.

The best you can do is something like:

In [17]: with pd.get_store('test.h5', mode='r') as store:
      selection = store.select('df', '(f > 0.2) & (f < 0.4)')
   ....:     

In [18]: selection
Out[18]: 
     f    s
3  0.3  0.3

If this is a common idiom for you, make a function for it. You can even get fancy by incorporating numpy float precision.

Community
  • 1
  • 1
Dan Allan
  • 34,073
  • 6
  • 70
  • 63