Using two different data frames to compute new variable

Question

I have two dataframes of the same dimensions that look like:

In both dataframes I want to create a new variable that denotes an additive flag. So the new variable will look like this:

   df1
ID    flag   new_flag
0      1        1
1      0        1
2      1        1

  df2
ID    flag   new_flag
0      0        1
1      1        1
2      0        1

So if either flag columns is a 1 the new flag will be a 1. I tried this code:

df1['new_flag']= 1
df2['new_flag']= 1

df1['new_flag'][(df1['flag']==0)&(df1['flag']==0)]=0
df2['new_flag'][(df2['flag']==0)&(df2['flag']==0)]=0

I would expect the same number of 1 in both new_flag but they differ. Is this because I'm not going row by row? Like this question? pandas create new column based on values from other columns If so how do I include criteria from both datafrmes?

Also what you tried in principle should've worked but it's called chained indexing so you may have been operating on a copy which is why it didn't work, you should change to `df1.loc[(df1['flag']==0)&(df1['flag']==0), 'new_flag']=0 df2.loc[(df2['flag']==0)&(df2['flag']==0), 'new_flag']=0` — EdChum, Sep 08 '16 at 15:00

score 2 · Accepted Answer · answered Sep 08 '16 at 14:54

You can use np.logical_or to achieve this, if we set df1 to be all 0's except for the last row so we don't just get a column of 1's, we can cast the result of np.logical_or using astype(int) to convert the boolean array to 1 and 0:

In [108]:
df1['new_flag'] = np.logical_or(df1['flag'], df2['flag']).astype(int)
df2['new_flag'] = np.logical_or(df1['flag'], df2['flag']).astype(int)
df1

Out[108]:
   ID  flag  new_flag
0   0     0         0
1   1     0         1
2   2     1         1

In [109]:
df2

Out[109]:
   ID  flag  new_flag
0   0     0         0
1   1     1         1
2   2     0         1

Using two different data frames to compute new variable

1 Answers1