0

I don't see the below case mentioned in Pandas Merging 101.



I'm having trouble understanding the Pandas documentation for doing a left outer join.

import pandas as pd

left_df = pd.DataFrame({ 
                       'user_id': ['Peter', 'John', 'Robert', 'Anna']
                       
                      })

right_df = pd.DataFrame({'user_id': ['Paul', 'Mary', 'John',
                                     'Anna']
                       })


pd.merge(left_df, right_df, on = 'user_id', how = 'left')

Output is:

    user_id
0   Peter
1   John
2   Robert
3   Anna

Expected output:

    user_id
0   Peter
1   Robert

What am I missing? Is the indicator = True parameter a must (to create a _merge column to filter on) for left outer joins?

foobash
  • 73
  • 5
  • Because you asked to get all keys from left dataframe even if there is no match with right dataframe. What you want is `how='inner'`, i.e. the intesection between keys of both dataframe. – Corralien Feb 09 '23 at 19:49
  • @Corralien Sorry, I typed my Expected output wrong. Updated it. "Peter" and "Robert" would be the 'left outer join'....correct? – foobash Feb 09 '23 at 20:05
  • Now you have fixed your output, `merge` is not the better option to get what you want. – Corralien Feb 09 '23 at 20:15

1 Answers1

1

You can use merge with indicator=True and keep only rows where value is set to left_only but it's not the best way. You can use isin to get a boolean mask then invert it:

>>> left_df[~left_df['user_id'].isin(right_df['user_id'])]

  user_id
0   Peter
2  Robert

With merge:

>>> (left_df.merge(right_df, on='user_id', how='left', indicator='present')
            .loc[lambda x: x.pop('present') == 'left_only'])

  user_id
0   Peter
2  Robert
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • @Corraliean Thank you, I understood the indicator solution....but is it a must with merge for a left outer join? In other words...what is my syntax saying with how = 'left'? "Just give me back left_df"? I like your first answer too...reminds me of the R language. – foobash Feb 09 '23 at 20:18
  • And I agree, your first solution is better. I'm just wondering in the context of Pandas Merging 101. – foobash Feb 09 '23 at 20:21
  • Your request does not really match with the use of `merge`. Generally, we try to keep common lines (how='inner') or left rows (how='left') / right rows (how='left') whatever the other dataframe, or all lines (how='outer') of both dataframes. You can also get the cartesian product (how='cross') of all dataframes. – Corralien Feb 09 '23 at 20:58