1

I'm trying to figure out how to block out null responses from a selection, and was wondering how to formulate the where statement such that it produces the correct selection. For instance, let's say I have the following code:

df = pd.DataFrame({'A' : ['foo','foo','bar','bar','baz'],
                    'B' : [1,2,1,2,np.nan], 
                    'C' : np.random.randn(5) })

df.to_hdf('test.h5', 'df', mode='w', format='table', data_columns=True)

pd.read_hdf('test.h5', 'df')

     A   B         C
0  foo   1 -0.046065
1  foo   2 -0.987685
2  bar   1 -0.110967
3  bar   2 -1.989150
4  baz NaN  0.126864

I essentially want the equivalent of saying:

    pd.read_hdf('test.h5', 'df', where='B is not null')

How can I go about doing that?

Thanks!

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
halsdunes
  • 1,199
  • 5
  • 16
  • 28
  • This might help - http://stackoverflow.com/questions/3965104/not-none-test-in-python – hhh_ Jan 29 '16 at 17:36

2 Answers2

0

It looks like it can't be done directly, here is an ugly workaround for numeric columns:

pd.read_hdf('test.h5', 'df', where='B <= 0 | B > 0')
Stop harming Monica
  • 12,141
  • 1
  • 36
  • 56
0

I think it can be done this way:

pd.read_hdf('test.h5', 'df', where='B == B')
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419