1

I have two lists to start with:

delta = ['1','5']
taxa = ['2','3','4']

My dataframe will look like :

data = { 'id': [101,102,103,104,105],
       '1_srcA': ['a', 'b','c', 'd', 'g'] ,
       '1_srcB': ['a', 'b','c', 'd', 'e'] ,
       '2_srcA': ['g', 'b','f', 'd', 'e'] ,
       '2_srcB': ['a', 'b','c', 'd', 'e'] ,
       '3_srcA': ['a', 'b','c', 'd', 'e'] ,
       '3_srcB': ['a', 'b','1', 'd', 'm'] ,
       '4_srcA': ['a', 'b','c', 'd', 'e'] ,
       '4_srcB': ['a', 'b','c', 'd', 'e'] ,
       '5_srcA': ['a', 'b','c', 'd', 'e'] ,
       '5_srcB': ['m', 'b','c', 'd', 'e'] }
df = pd.DataFrame(data)
df

enter image description here

I have to do two types of checks on this dataframe. Say, Delta check and Taxa checks. For Delta checks, based on list delta = ['1','5'] I have to compare 1_srcA vs 1_srcB and 5_srcA vs 5_srcB since '1' is in 1_srcA ,1_srcB and '5' is in 5_srcA, 5_srcB . If the values differ, I have to populate 2. For tax checks (based on values from taxa list), it should be 1. If no difference, it is 0.

So, this comparison has to happen on all the rows. df is generated based on merge of two dataframes. so, there will be only two cols which has '1' in it, two cols which has '2' in it and so on.

Conditions I have to check:

  1. I need to check if columns containing values from delta list differs. If yes, I will populate 2.
  2. need to check if columns containing values from taxa list differs. If yes, I will populate 1.
  3. If condition 1 and condition 2 are satisfied, then populate 2.
  4. If none of the conditions satisfied, then 0.

So, my output should look like:

Expected Output

The code I tried:

df_cols_ = df.columns.tolist()[1:]
conditions = []
res = {}
for i,col in enumerate(df_cols_):
    if (i == 0) or (i%2 == 0) :
        continue
    var = 'cond_'+str(i)
    for del_col in delta:
        if del_col in col:
            var = var + '_F'
            break
    print (var)
    cond = f"df.iloc[:, {i}] != df.iloc[:, {i+1}]"
    res[var] = cond
    conditions.append(cond)

The res dict will look like the below. But how can i use the condition to populate?

enter image description here

Is the any optimal solution the resultant dataframe can be derived? Thanks.

usr_lal123
  • 650
  • 12
  • 28
  • you need to be more specific on the conditions. you compare each row, right ? in that row you compare pairwise the columns with the same number (if the number is in delta). if ANY of the pairs are different ->2, if ALL pairs are equal -> 0, but what about the resulting 1 and the taxa list? – Rabinzel Sep 22 '22 at 05:31
  • Thanks. Updated the taxa list now. delta and taxa list will be mutually exclusive – usr_lal123 Sep 22 '22 at 05:34
  • But it is not clear in which row to compare the cols of the values in delta list and when taxa list? There is only one desired output column and I can't see how these two lists are connected. And another thing: taxa list "(based on values from taxa list), it should be 1". based on what? – Rabinzel Sep 22 '22 at 05:36
  • @Rabinzel Added more details. Please check if it helps. – usr_lal123 Sep 22 '22 at 05:50

1 Answers1

1

Create helper function for filter values by DataFrame.filter and compare them for not equal, then use np.logical_or.reduce for processing list of boolean masks to one mask and pass to numpy.select:

delta = ['1','5']
taxa = ['2','3','4']

def f(x):
    df1 = df.filter(like=x)
    return df1.iloc[:, 0].ne(df1.iloc[:, 1])

d = np.logical_or.reduce([f(x) for x in delta])
print (d)
[ True False False False  True]

t = np.logical_or.reduce([f(x) for x in taxa])
print (t)
[ True False  True False  True]


df['res'] = np.select([d, t], [2, 1], default=0)
print (df)
    id 1_srcA 1_srcB 2_srcA 2_srcB 3_srcA 3_srcB 4_srcA 4_srcB 5_srcA 5_srcB  \
0  101      a      a      g      a      a      a      a      a      a      m   
1  102      b      b      b      b      b      b      b      b      b      b   
2  103      c      c      f      c      c      1      c      c      c      c   
3  104      d      d      d      d      d      d      d      d      d      d   
4  105      g      e      e      e      e      m      e      e      e      e   

   res  
0    2  
1    0  
2    1  
3    0  
4    2 
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252