Merge two dataframes with special condition: rewrite how function of pandas.merge

Question

I have two dataframes sharing the same indices. I would like to get a third dataframe which contains the data of the first dataframe for the rows which are not null and the rows from the second dataframe otherwise. I could that if there was an option of rewriting the how parameter of the pandas.merge.

Maybe there is a fast way of doing this, which is not obvious at the moment.

EDIT: The indices overlapping in the dataframes are the same, but it might be that some indices appear in one dataframe but not the other. This makes iterating over the indices a costly solution.

So here it is with the data: df_1 and df_2 as inputs and df_3 as the output.

df_1=
             0    1    2    3
bar   one  NaN  NaN  NaN  NaN
baz   two   AI   BI   CI   DI
foo   one   AJ   BJ   CJ   DJ
      two   AK   BK   CK   DK
qux   one   AL   BL   CL   DL
      two   AM   BM   CM   DM
other one  NaN  NaN  NaN  NaN

df_2=

          0   1   2   3
bar one  AA  BA  CA  DA
    two  AB  BB  CB  DB
baz one  AC  BC  CC  DC
    two  AD  BD  CD  DD
foo one  AE  BE  CE  DE
    two  AF  BF  CF  DF
qux two  AG  BG  CG  DG

df_3=

             0    1    2     3
bar   one   AA   BA   CA    DA
      two   AB   BB   CB    DB
baz   one   AC   BC   CC    DC
      two   AI   BI   CI    DI
foo   one   AJ   BJ   CJ    DJ
      two   AK   BK   CK    DK
qux   one   AL   BL   CL    DL
      two   AM   BM   CM    DM
other one  NaN  NaN  NaN  None

does this help: https://stackoverflow.com/questions/30296798/merge-only-when-value-is-empty-null-in-pandas? — Pygirl, Jan 28 '21 at 11:23
you can extend it for multile columns too. I can't think of any better solution right now — Pygirl, Jan 28 '21 at 11:37

Pygirl · Answer 1 · 2021-01-28T12:50:17.867

I can think of only this:

col = df1.columns

def myfunc(x):
    if (df2.index.isin([x.name]).any() and pd.isnull(x[col]).all()):
        return pd.Series(df2.loc[x.name].values)
    else: 
        return pd.Series(x[col].values)
        
df1 = df1.apply(myfunc,axis=1)

df1:

            0   1   2   3
bar   one   AA  BA  CA  DA
baz   two   AI  BI  CI  DI
foo   one   AJ  BJ  CJ  DJ
two   AK    BK  CK  DK  None
qux   one   AL  BL  CL  DL
two   AM    BM  CM  DM  None
other one   NaN NaN NaN NaN

Feel free to modify this code.

Merge two dataframes with special condition: rewrite how function of pandas.merge

1 Answers1