Indexing with list of columns and multiple conditions

Question

Trying to get columns that has zero for some columns but not zero for others.

import pandas as pd
import numpy as np
df=pd.DataFrame({'t':[0,0,0,np.nan],'t2':[0,0,2,3],'t3':[1,0,0,np.nan],'t4':[0,np.nan,2,3]})
zero=['t','t3']
nozero=['t2','t4']

Dataframe:

     t  t2   t3   t4
0  0.0   0  1.0  0.0
1  0.0   0  0.0  NaN
2  0.0   2  0.0  2.0
3  NaN   3  NaN  3.0

I have tried:

df[((df[zero]==0).any(axis=1))&((df[nozero]!=0).any(axis=1))]

which gives

     t  t2   t3   t4
1  0.0   0  0.0  NaN
2  0.0   2  0.0  2.0

also tried:

df[((df[zero]==0)&(df[nozero]!=0)).any(axis=1)]

which gives an empty dataframe.

Expected:

     t  t2   t3   t4
2  0.0   2  0.0  2.0

Any help would be great. Thanks.

EDIT FOR CLARIFICATION:

I need both conditions(ALL) to be true (zero==0 and nozero!=0) for each pair (t,t2 and t3,t4) but if the row has ANY of these pair true, I want that row.

For example from the following dataframe:

df = pd.DataFrame({'t': [0, 0, 11,0], 't2': [0, 0, 0, 0], 'z3': [0, 0, 12, 0], 't4': [0, 0, 2, 0],
                   't5': [0, 0, 0, 0], 'z6': [0, 0, 4, 0], 't7': [1, 1, 0, 1], 't8': [1, 1, 0, 1],
                   'z9': [1, 1, 0, 1], 't10': [1, 1, 0, 1], 't11': [1, 1, 0, 1], 'z12': [1, 1, 0, 1]})

nozero=['z3','z6','z9','z12']
zero=list(set(df.columns)-set(nozero))

    t  t2  z3  t4  t5  z6  t7  t8  z9  t10  t11  z12
0   0   0   0   0   0   0   1   1   1    1    1    1
1   0   0   0   0   0   0   1   1   1    1    1    1
2  11   0  12   2   0   4   0   0   0    0    0    0
3   0   0   0   0   0   0   1   1   1    1    1    1

I only want row 2 because any(t,t2) is zero and z3 is not zero.

EDIT FOR CLARIFICATION:

The following code generates the rows I want to keep (index=2). Is there a more efficient way than this using .any or .all? (as this requires appending the rows, allocating memory, looping, etc)

import pandas as pd

    df = pd.DataFrame({'t': [0, 0, 11,0], 't2': [0, 0, 0, 0], 'z3': [0, 0, 12, 0], 't4': [0, 0, 2, 0],
                       't5': [0, 0, 0, 0], 'z6': [0, 0, 4, 0], 't7': [1, 1, 0, 1], 't8': [1, 1, 0, 1],
                       'z9': [1, 1, 0, 1], 't10': [1, 1, 0, 1], 't11': [1, 1, 0, 1], 'z12': [1, 1, 0, 1]})
    nozero=['z3','z6','z9','z12']
    zero1=['t','t4','t7','t10']
    zero2=['t2','t5','t8','t11']
    for x,y,z in zip(zero1, zero2, nozero):
        print(df[((df[x]==0) | (df[y]==0)) & (df[z]!=0)])

Ananay Mital · Answer 1 · 2019-01-12T05:18:07.287

0

I think it's because of NaN. NaN != 0 is True

df[((df[zero]==0).any(axis=1))&((df[nozero]!=0).any(axis=1))].dropna()

df[((df[zero]==0).any(axis=1))&((df[nozero]!=0).any(axis=1))&(df[zero+nozero].notnull().all(axis=1))]

      t  t2   t3  t4
2   0.0   2  0.0   2




df

     t  t10 t11 t2  t4  t5  t7  t8  z12 z3  z6  z9
0    0    1   1  0   0   0   1   1    1  0   0   1 
1    0    0   0  0   0   0   0   0    1 12   4   1
2   11    1   0  0   2   0   1   0    0  0   0   0
3    0    1   1  0   0   0   1   1    1  0   0   1

df[((df[zero]==0).all(1))&((df[nozero]!=0).all(1))&(df[zero+nozero].notnull().all(1))]

    t t10 t11  t2  t4  t5  t7  t8 z12   z3  z6  z9
1   0   0   0   0   0   0   0   0   1   12   4   1

edited Jan 12 '19 at 05:18

answered Jan 11 '19 at 16:59

Ananay Mital

1,395
1
11
16

Just checking for zero==0 and nozero!=0 will allow NaN from nozero columns. You can check and remove all `NaN`s or `dropna` all together – Ananay Mital Jan 11 '19 at 18:32
edited the question with an example that doesn't work with your code – Ricky Kim Jan 11 '19 at 21:40
can you put up the expected output for the example you put up? It'll help me visualize – Ananay Mital Jan 12 '19 at 05:20
I want the row with index #2 one with t=11 – Ricky Kim Jan 12 '19 at 21:05
Hi, I edited the question with code that generates the row I want to keep but not efficient. Is there a better way using your method? – Ricky Kim Jan 14 '19 at 15:28

m13op22 · Answer 2 · 2019-01-14T15:41:39.440

0

Based on your discussion with @coldspeed, it seems you're working with a dataframe that looks more like

     t   t2   t3   t4
0  0.0  0.0  1.0  0.0
1  0.0  0.0  0.0  NaN
2  0.0  2.0  0.0  2.0
3  NaN  3.0  NaN  3.0
4  0.0  1.0  0.0  0.0

After converting the data types like @coldspeed suggested, you can go ahead and use this to select the data you want.

In [193]: df[((df[zero] == 0).all(1) & (df[nozero] != 0).any(1))]
Out[193]: 
     t   t2   t3   t4
1  0.0  0.0  0.0  NaN
2  0.0  2.0  0.0  2.0
4  0.0  1.0  0.0  0.0

The first condition (df[zero]==0).all(1) finds all values in the zero columns that are 0 (all(1) tells pandas to reduce by column). The second condition finds any values in the nozero columns that are not zero, which as @Ananay said, includes NaN.

Like @Ananay said in their answer, you can remove rows with NaN values by using dropna() so that the final result is

In [194]: df[((df[zero] == 0).all(1) & (df[nozero] != 0).any(1))].dropna()
Out[194]: 
     t   t2   t3   t4
2  0.0  2.0  0.0  2.0
4  0.0  1.0  0.0  0.0

I'd also suggest reading more about the .any() and .all() methods.

EDIT

I can do

df[((df['t']==0) | (df['t2']==0)) & (df['z3']!=0)]

but I don't know if you want to get that specific. The problem is just finding the right boolean conditions for my previous method. We want row 3 to return True and the others to return False. You could try renaming this question to "Boolean indexing to find values based on conditions for list of columns" or search for that.

edited Jan 14 '19 at 15:41

answered Jan 11 '19 at 20:31

m13op22

2,168
2
16
35

edited the question with an example that doesn't work with your code – Ricky Kim Jan 11 '19 at 21:40
@RickyKim Is this new dataframe an exact replica of the dataframe you're working with? – m13op22 Jan 11 '19 at 22:50
Yea I have now. It's the exact content of the dataframe (first 4 rows) – Ricky Kim Jan 11 '19 at 23:55
@RickyKim so if any column in "zero" has a zero AND any column in "nozero" has a value that is not zero, you want that row? – m13op22 Jan 13 '19 at 17:52
yes I want the row with t=11 index=2. if any(t,t2 pair is zero and z not zero) i want that row – Ricky Kim Jan 13 '19 at 18:37
Hi, I edited the question with code that generates the row I want to keep but not efficient. Is there a better way using your method? – Ricky Kim Jan 14 '19 at 15:28

Ricky Kim · Accepted Answer · 2019-01-14T16:48:19.117

As I could not get the desired rows with .any or .all, I wrote a workaround code. If you find any more efficient way without looping through the columns, let me know. Thanks for all the help @Ananay Mital, @m42op64, @coldspeed.

import pandas as pd

df = pd.DataFrame({'t': [0, 0, 11,0], 't2': [0, 0, 0, 0], 'z3': [0, 0, 12, 0], 't4': [0, 0, 2, 0],
                   't5': [0, 0, 0, 0], 'z6': [0, 0, 4, 0], 't7': [1, 1, 0, 1], 't8': [1, 1, 0, 1],
                   'z9': [1, 1, 0, 1], 't10': [1, 1, 0, 1], 't11': [1, 1, 0, 1], 'z12': [1, 1, 0, 1]})
nozero=['z3','z6','z9','z12']
zero1=['t','t4','t7','t10']
zero2=['t2','t5','t8','t11']
indices=set()
for x,y,z in zip(zero1, zero2, nozero):
    i=df[((df[x]==0) | (df[y]==0)) & (df[z]!=0)].index.values
    indices.update(i)
print(df.loc[list(indices)])

OUTPUT

    t  t2  z3  t4  t5  z6  t7  t8  z9  t10  t11  z12
2  11   0  12   2   0   4   0   0   0    0    0    0

Indexing with list of columns and multiple conditions

3 Answers3