I have two lists to start with:
delta = ['1','5']
taxa = ['2','3','4']
My dataframe will look like :
data = { 'id': [101,102,103,104,105],
'1_srcA': ['a', 'b','c', 'd', 'g'] ,
'1_srcB': ['a', 'b','c', 'd', 'e'] ,
'2_srcA': ['g', 'b','f', 'd', 'e'] ,
'2_srcB': ['a', 'b','c', 'd', 'e'] ,
'3_srcA': ['a', 'b','c', 'd', 'e'] ,
'3_srcB': ['a', 'b','1', 'd', 'm'] ,
'4_srcA': ['a', 'b','c', 'd', 'e'] ,
'4_srcB': ['a', 'b','c', 'd', 'e'] ,
'5_srcA': ['a', 'b','c', 'd', 'e'] ,
'5_srcB': ['m', 'b','c', 'd', 'e'] }
df = pd.DataFrame(data)
df
I have to do two types of checks on this dataframe. Say, Delta check and Taxa checks. For Delta checks, based on list delta = ['1','5'] I have to compare 1_srcA vs 1_srcB and 5_srcA vs 5_srcB since '1' is in 1_srcA ,1_srcB and '5' is in 5_srcA, 5_srcB . If the values differ, I have to populate 2. For tax checks (based on values from taxa list), it should be 1. If no difference, it is 0.
So, this comparison has to happen on all the rows. df is generated based on merge of two dataframes. so, there will be only two cols which has '1' in it, two cols which has '2' in it and so on.
Conditions I have to check:
- I need to check if columns containing values from delta list differs. If yes, I will populate 2.
- need to check if columns containing values from taxa list differs. If yes, I will populate 1.
- If condition 1 and condition 2 are satisfied, then populate 2.
- If none of the conditions satisfied, then 0.
So, my output should look like:
The code I tried:
df_cols_ = df.columns.tolist()[1:]
conditions = []
res = {}
for i,col in enumerate(df_cols_):
if (i == 0) or (i%2 == 0) :
continue
var = 'cond_'+str(i)
for del_col in delta:
if del_col in col:
var = var + '_F'
break
print (var)
cond = f"df.iloc[:, {i}] != df.iloc[:, {i+1}]"
res[var] = cond
conditions.append(cond)
The res dict will look like the below. But how can i use the condition to populate?
Is the any optimal solution the resultant dataframe can be derived? Thanks.