Filter dataframe by logic

Question

I have this dataframe:

      MAT MAT_IN  SER D8
0   X     A1  Yes  -
1   X     A2   No  X
2   X     A3   No  X
3   X     A4  Yes  X
4   Y     B1   No  -
5   Y     B2   No  -
6   Y     B3   No  -
7   Y     B4   No  -
8   Y     B5   No  -
9   Z     K1  Yes  -
10   Z     K2   No  -
11   Z     K3   No  X
12   Z     K4  Yes  X

Where I would like to create a filter so I only get the first row. The filter have the following criteria:

Column "Ser" = "Yes" and Column "D8" = "-"
Rest of the rows in the group (Column "MAT", in this case X), must have Column "D8" = "X"

How do I apply such a filter on my dataframe?

My code:

import pandas as pd

data = {'MAT':['X','X','X','X','Y','Y','Y','Y','Y','Z','Z','Z','Z'],
       'MAT_IN':['A1','A2','A3','A4','B1','B2','B3','B4','B5','K1','K2','K3','K4'],
       'SER':['Yes','No','No','Yes','No','No','No','No','No','Yes','No','No','Yes'],
       'D8':['-','X','X','X','-','-','-','-','-','-','-','X','X']}

df1=pd.DataFrame(data, columns=['MAT','MAT_IN','SER','D8'])

df1

Any expected result are in Column D99:

MAT MAT_IN  SER D8 D99
0   X     A1  Yes  -  X
1   X     A2   No  X  -
2   X     A3   No  X  -
3   X     A4  Yes  X  -
4   Y     B1   No  -  -
5   Y     B2   No  -  -
6   Y     B3   No  -  -
7   Y     B4   No  -  -
8   Y     B5   No  -  -
9   Z     K1  Yes  -  -
10   Z     K2   No  -  -
11   Z     K3   No  X  -
12   Z     K4  Yes  X  -

you want the first row of each group to have Ser as 'Yes', DB as '-' and all the rest D8 as 'X' ? And what is your expected result? All the rows for such groups or just the first row? — IoaTzimas, Oct 12 '20 at 10:33
Does this answer your question? [Get first row of dataframe in Python Pandas based on criteria](https://stackoverflow.com/questions/40660088/get-first-row-of-dataframe-in-python-pandas-based-on-criteria) — deadshot, Oct 12 '20 at 10:34

jezrael · Accepted Answer · 2020-10-12T12:37:28.183

0

Create mask for compare for equal first value of group chaining by & for bitwise AND with inverted mask of method Series.duplicated and also chain last mask mask2 for remove first rows per groups by duplicated, compare by X and test by DataFrameGroupBy.all if all X per groups without first, for Series wit hsame size like original add Series.map:

mask = df1['MAT'].duplicated()
mask2 = df1['MAT'].map(df1.loc[mask, 'D8'].eq('X').groupby(df1['MAT']).all())
mask = df1['SER'].eq('Yes') & ~mask & df1['D8'].eq('-') & mask2
df1['D99'] = np.where(mask, 'X','-')
print (df1)
   MAT MAT_IN  SER D8 D99
0    X     A1  Yes  -   X
1    X     A2   No  X   -
2    X     A3   No  X   -
3    X     A4  Yes  X   -
4    Y     B1   No  -   -
5    Y     B2   No  -   -
6    Y     B3   No  -   -
7    Y     B4   No  -   -
8    Y     B5   No  -   -
9    Z     K1  Yes  -   -
10   Z     K2   No  -   -
11   Z     K3   No  X   -
12   Z     K4  Yes  X   -

edited Oct 12 '20 at 12:37

answered Oct 12 '20 at 10:36

jezrael

822,522
95
1,334
1,252

1

Nice with the extra column, then I can you a simple filter to filter out. Values in the new column "D99" for rows with MAT = Y are correct, but for MAT=X the D99=X should only have been applied for the first row where SER=Yes and D8=- (so opposite) – jvels Oct 12 '20 at 11:17
Alost there :) If row number 2 are changed, so D8 are = - instead of X and run the code. Then D99 should not have a X for the first row, because items there not have Ser=Yes and D8=- should have at least have D8=X – jvels Oct 12 '20 at 11:41
So df1['SER'].eq('Yes') & df1['D8'].eq('-') should have a X in D99 if the rest of the rows in the group by MAT have D8=X – jvels Oct 12 '20 at 11:53

Filter dataframe by logic

1 Answers1