Pandas - Creating new column based on dynamic conditions from lists

Question

I have two lists to start with:

delta = ['1','5']
taxa = ['2','3','4']

My dataframe will look like :

data = { 'id': [101,102,103,104,105],
       '1_srcA': ['a', 'b','c', 'd', 'g'] ,
       '1_srcB': ['a', 'b','c', 'd', 'e'] ,
       '2_srcA': ['g', 'b','f', 'd', 'e'] ,
       '2_srcB': ['a', 'b','c', 'd', 'e'] ,
       '3_srcA': ['a', 'b','c', 'd', 'e'] ,
       '3_srcB': ['a', 'b','1', 'd', 'm'] ,
       '4_srcA': ['a', 'b','c', 'd', 'e'] ,
       '4_srcB': ['a', 'b','c', 'd', 'e'] ,
       '5_srcA': ['a', 'b','c', 'd', 'e'] ,
       '5_srcB': ['m', 'b','c', 'd', 'e'] }
df = pd.DataFrame(data)
df

I have to do two types of checks on this dataframe. Say, Delta check and Taxa checks. For Delta checks, based on list delta = ['1','5'] I have to compare 1_srcA vs 1_srcB and 5_srcA vs 5_srcB since '1' is in 1_srcA ,1_srcB and '5' is in 5_srcA, 5_srcB . If the values differ, I have to populate 2. For tax checks (based on values from taxa list), it should be 1. If no difference, it is 0.

So, this comparison has to happen on all the rows. df is generated based on merge of two dataframes. so, there will be only two cols which has '1' in it, two cols which has '2' in it and so on.

Conditions I have to check:

I need to check if columns containing values from delta list differs. If yes, I will populate 2.
need to check if columns containing values from taxa list differs. If yes, I will populate 1.
If condition 1 and condition 2 are satisfied, then populate 2.
If none of the conditions satisfied, then 0.

So, my output should look like:

The code I tried:

df_cols_ = df.columns.tolist()[1:]
conditions = []
res = {}
for i,col in enumerate(df_cols_):
    if (i == 0) or (i%2 == 0) :
        continue
    var = 'cond_'+str(i)
    for del_col in delta:
        if del_col in col:
            var = var + '_F'
            break
    print (var)
    cond = f"df.iloc[:, {i}] != df.iloc[:, {i+1}]"
    res[var] = cond
    conditions.append(cond)

The res dict will look like the below. But how can i use the condition to populate?

Is the any optimal solution the resultant dataframe can be derived? Thanks.

you need to be more specific on the conditions. you compare each row, right ? in that row you compare pairwise the columns with the same number (if the number is in delta). if ANY of the pairs are different ->2, if ALL pairs are equal -> 0, but what about the resulting 1 and the taxa list? — Rabinzel, Sep 22 '22 at 05:31
Thanks. Updated the taxa list now. delta and taxa list will be mutually exclusive — usr_lal123, Sep 22 '22 at 05:34
But it is not clear in which row to compare the cols of the values in delta list and when taxa list? There is only one desired output column and I can't see how these two lists are connected. And another thing: taxa list "(based on values from taxa list), it should be 1". based on what? — Rabinzel, Sep 22 '22 at 05:36

jezrael · Accepted Answer · 2022-09-22T06:04:59.773

Create helper function for filter values by DataFrame.filter and compare them for not equal, then use np.logical_or.reduce for processing list of boolean masks to one mask and pass to numpy.select:

delta = ['1','5']
taxa = ['2','3','4']

def f(x):
    df1 = df.filter(like=x)
    return df1.iloc[:, 0].ne(df1.iloc[:, 1])

d = np.logical_or.reduce([f(x) for x in delta])
print (d)
[ True False False False  True]

t = np.logical_or.reduce([f(x) for x in taxa])
print (t)
[ True False  True False  True]


df['res'] = np.select([d, t], [2, 1], default=0)
print (df)
    id 1_srcA 1_srcB 2_srcA 2_srcB 3_srcA 3_srcB 4_srcA 4_srcB 5_srcA 5_srcB  \
0  101      a      a      g      a      a      a      a      a      a      m   
1  102      b      b      b      b      b      b      b      b      b      b   
2  103      c      c      f      c      c      1      c      c      c      c   
3  104      d      d      d      d      d      d      d      d      d      d   
4  105      g      e      e      e      e      m      e      e      e      e   

   res  
0    2  
1    0  
2    1  
3    0  
4    2

Wow. Super. Thank you. Learnt few things new. – usr_lal123 Sep 22 '22 at 06:26 — usr_lal123, Sep 22 '22 at 06:26

Pandas - Creating new column based on dynamic conditions from lists

1 Answers1