0

Trying to get columns that has zero for some columns but not zero for others.

import pandas as pd
import numpy as np
df=pd.DataFrame({'t':[0,0,0,np.nan],'t2':[0,0,2,3],'t3':[1,0,0,np.nan],'t4':[0,np.nan,2,3]})
zero=['t','t3']
nozero=['t2','t4']

Dataframe:

     t  t2   t3   t4
0  0.0   0  1.0  0.0
1  0.0   0  0.0  NaN
2  0.0   2  0.0  2.0
3  NaN   3  NaN  3.0

I have tried:

df[((df[zero]==0).any(axis=1))&((df[nozero]!=0).any(axis=1))]

which gives

     t  t2   t3   t4
1  0.0   0  0.0  NaN
2  0.0   2  0.0  2.0

also tried:

df[((df[zero]==0)&(df[nozero]!=0)).any(axis=1)]

which gives an empty dataframe.

Expected:

     t  t2   t3   t4
2  0.0   2  0.0  2.0

Any help would be great. Thanks.

EDIT FOR CLARIFICATION:

I need both conditions(ALL) to be true (zero==0 and nozero!=0) for each pair (t,t2 and t3,t4) but if the row has ANY of these pair true, I want that row.

For example from the following dataframe:

df = pd.DataFrame({'t': [0, 0, 11,0], 't2': [0, 0, 0, 0], 'z3': [0, 0, 12, 0], 't4': [0, 0, 2, 0],
                   't5': [0, 0, 0, 0], 'z6': [0, 0, 4, 0], 't7': [1, 1, 0, 1], 't8': [1, 1, 0, 1],
                   'z9': [1, 1, 0, 1], 't10': [1, 1, 0, 1], 't11': [1, 1, 0, 1], 'z12': [1, 1, 0, 1]})

nozero=['z3','z6','z9','z12']
zero=list(set(df.columns)-set(nozero))

    t  t2  z3  t4  t5  z6  t7  t8  z9  t10  t11  z12
0   0   0   0   0   0   0   1   1   1    1    1    1
1   0   0   0   0   0   0   1   1   1    1    1    1
2  11   0  12   2   0   4   0   0   0    0    0    0
3   0   0   0   0   0   0   1   1   1    1    1    1

I only want row 2 because any(t,t2) is zero and z3 is not zero.

EDIT FOR CLARIFICATION:

The following code generates the rows I want to keep (index=2). Is there a more efficient way than this using .any or .all? (as this requires appending the rows, allocating memory, looping, etc)

import pandas as pd

    df = pd.DataFrame({'t': [0, 0, 11,0], 't2': [0, 0, 0, 0], 'z3': [0, 0, 12, 0], 't4': [0, 0, 2, 0],
                       't5': [0, 0, 0, 0], 'z6': [0, 0, 4, 0], 't7': [1, 1, 0, 1], 't8': [1, 1, 0, 1],
                       'z9': [1, 1, 0, 1], 't10': [1, 1, 0, 1], 't11': [1, 1, 0, 1], 'z12': [1, 1, 0, 1]})
    nozero=['z3','z6','z9','z12']
    zero1=['t','t4','t7','t10']
    zero2=['t2','t5','t8','t11']
    for x,y,z in zip(zero1, zero2, nozero):
        print(df[((df[x]==0) | (df[y]==0)) & (df[z]!=0)])
Ricky Kim
  • 1,992
  • 1
  • 9
  • 18

3 Answers3

0

I think it's because of NaN. NaN != 0 is True

df[((df[zero]==0).any(axis=1))&((df[nozero]!=0).any(axis=1))].dropna()

df[((df[zero]==0).any(axis=1))&((df[nozero]!=0).any(axis=1))&(df[zero+nozero].notnull().all(axis=1))]

      t  t2   t3  t4
2   0.0   2  0.0   2




df

     t  t10 t11 t2  t4  t5  t7  t8  z12 z3  z6  z9
0    0    1   1  0   0   0   1   1    1  0   0   1 
1    0    0   0  0   0   0   0   0    1 12   4   1
2   11    1   0  0   2   0   1   0    0  0   0   0
3    0    1   1  0   0   0   1   1    1  0   0   1

df[((df[zero]==0).all(1))&((df[nozero]!=0).all(1))&(df[zero+nozero].notnull().all(1))]

    t t10 t11  t2  t4  t5  t7  t8 z12   z3  z6  z9
1   0   0   0   0   0   0   0   0   1   12   4   1
Ananay Mital
  • 1,395
  • 1
  • 11
  • 16
0

Based on your discussion with @coldspeed, it seems you're working with a dataframe that looks more like

     t   t2   t3   t4
0  0.0  0.0  1.0  0.0
1  0.0  0.0  0.0  NaN
2  0.0  2.0  0.0  2.0
3  NaN  3.0  NaN  3.0
4  0.0  1.0  0.0  0.0

After converting the data types like @coldspeed suggested, you can go ahead and use this to select the data you want.

In [193]: df[((df[zero] == 0).all(1) & (df[nozero] != 0).any(1))]
Out[193]: 
     t   t2   t3   t4
1  0.0  0.0  0.0  NaN
2  0.0  2.0  0.0  2.0
4  0.0  1.0  0.0  0.0

The first condition (df[zero]==0).all(1) finds all values in the zero columns that are 0 (all(1) tells pandas to reduce by column). The second condition finds any values in the nozero columns that are not zero, which as @Ananay said, includes NaN.

Like @Ananay said in their answer, you can remove rows with NaN values by using dropna() so that the final result is

In [194]: df[((df[zero] == 0).all(1) & (df[nozero] != 0).any(1))].dropna()
Out[194]: 
     t   t2   t3   t4
2  0.0  2.0  0.0  2.0
4  0.0  1.0  0.0  0.0

I'd also suggest reading more about the .any() and .all() methods.

EDIT

I can do

df[((df['t']==0) | (df['t2']==0)) & (df['z3']!=0)]

but I don't know if you want to get that specific. The problem is just finding the right boolean conditions for my previous method. We want row 3 to return True and the others to return False. You could try renaming this question to "Boolean indexing to find values based on conditions for list of columns" or search for that.

m13op22
  • 2,168
  • 2
  • 16
  • 35
0

As I could not get the desired rows with .any or .all, I wrote a workaround code. If you find any more efficient way without looping through the columns, let me know. Thanks for all the help @Ananay Mital, @m42op64, @coldspeed.

import pandas as pd

df = pd.DataFrame({'t': [0, 0, 11,0], 't2': [0, 0, 0, 0], 'z3': [0, 0, 12, 0], 't4': [0, 0, 2, 0],
                   't5': [0, 0, 0, 0], 'z6': [0, 0, 4, 0], 't7': [1, 1, 0, 1], 't8': [1, 1, 0, 1],
                   'z9': [1, 1, 0, 1], 't10': [1, 1, 0, 1], 't11': [1, 1, 0, 1], 'z12': [1, 1, 0, 1]})
nozero=['z3','z6','z9','z12']
zero1=['t','t4','t7','t10']
zero2=['t2','t5','t8','t11']
indices=set()
for x,y,z in zip(zero1, zero2, nozero):
    i=df[((df[x]==0) | (df[y]==0)) & (df[z]!=0)].index.values
    indices.update(i)
print(df.loc[list(indices)])

OUTPUT

    t  t2  z3  t4  t5  z6  t7  t8  z9  t10  t11  z12
2  11   0  12   2   0   4   0   0   0    0    0    0
Ricky Kim
  • 1,992
  • 1
  • 9
  • 18