create new column that telling the values why they are not selected on pandas?

Question

Input

df=pd.DataFrame({'Name':['JOHN','ALLEN','BOB','NIKI','CHARLIE','CHANG'],
              'Age':[35,42,63,29,47,51],
              'Salary_in_1000':[100,93,78,120,64,115],
             'FT_Team':['STEELERS','SEAHAWKS','FALCONS','FALCONS','PATRIOTS','STEELERS']})


n1=(df['Age']< 60)
n2=(df['Salary_in_1000']>=100) 
n3=(df['FT_Team'].str.startswith('S'))

Using these conditions to select, it will return JOHN and CHANG.

Goal

I want to create dataframe where data is not selected and a new column which returns which conditions is not expected. For example,

* ALLEN: n1, n2
* BOB: n2,n3
* NIKI: n3
* CHANG: n2,n3

The new column name is reason. The value is the condition variable and the type is string.

Try

I have to try each condition and record each variable violates which rules by hand.

Umar.H · Accepted Answer · 2021-05-03T00:45:00.210

3

create a new dataframe then use .dot matrix on the boolean values and the column names.

s = pd.DataFrame({'n1' : n1, 'n2' : n2, 'n3' : n3})

df['reason'] = s.eq(False).dot(s.columns +',').str.rstrip(',')

print(df)
      Name  Age  Salary_in_1000   FT_Team    reason
0     JOHN   35             100  STEELERS          
1    ALLEN   42              93  SEAHAWKS        n2
2      BOB   63              78   FALCONS  n1,n2,n3
3     NIKI   29             120   FALCONS        n3
4  CHARLIE   47              64  PATRIOTS     n2,n3
5    CHANG   51             115  STEELERS

edited May 03 '21 at 00:45

answered May 03 '21 at 00:14

Umar.H

22,559
7
39
74

very nice +1, but i think he wants the inverted conditions – tdy May 03 '21 at 00:15
1

@tdy thank you my friend, easy enough to invert! although one could just write the correct boolean to begin with! – Umar.H May 03 '21 at 00:19
@Umar.H great, but if there are more than 10 rules , it's not convenient to create rules dataframe. – Jack May 03 '21 at 01:09
1

@Jack if you pass the conditions to a list you could create a dictionary i.e `s = pd.DataFrame({ f"n{i}" : data for i,data in enumerate([rules]) })` – Umar.H May 03 '21 at 02:59

Shubham Periwal · Answer 2 · 2021-05-03T00:08:01.243

You can simply check for each row and assign the value to the column like this:

def get_reason(row):
    n1 = row['Age'] < 60
    n2= row['Salary_in_1000']>=100
    n3= row['FT_Team'].startswith('S')
    
    conditions = {'n1': n1,'n2': n2,'n3': n3}

    #Get the conditions which were violated
    violated = [x for x in conditions if not conditions[x]]
    
    #join to string
    return ",".join(violated)

df['reasons'] = df.apply(get_reason, axis=1)

Output:

Name    Age Salary_in_1000  FT_Team     reasons
JOHN    35  100             STEELERS    
ALLEN   42  93              SEAHAWKS    n2
BOB     63  78              FALCONS     n1,n2,n3
NIKI    29  120             FALCONS     n3
CHARLIE 47  64              PATRIOTS    n2,n3
CHANG   51  115             STEELERS

create new column that telling the values why they are not selected on pandas?

2 Answers2

Linked