1

What is difference between

pd.merge(staff_df, student_df, how='left', left_index=True, right_index=True)

and

pd.merge(staff_df, student_df, how='left', left_on='Name', right_on='Name')

2 Answers2

3

First merge by indexes in both DataFrames by left join:

pd.merge(staff_df, student_df, how='left', left_index=True, right_index=True)

And second and simplify third left join by column Name in both DataFrames:

pd.merge(staff_df, student_df, how='left', left_on='Name', right_on='Name')
pd.merge(staff_df, student_df, how='left', on='Name')

Samples:

staff_df = pd.DataFrame({'Name':list('abb'),
                   'B':[4,5,3],
                   'C':list('asd')}, index=list('ded'))

print (staff_df)
   B  C Name
d  4  a    a
e  5  s    b
d  3  d    b

student_df = pd.DataFrame({'Name':list('abc'),
                   'A':[4,5,3],
                   'D':list('asd')}, index=list('dkl'))

print (student_df)
   A  D Name
d  4  a    a
k  5  s    b
l  3  d    c

df = pd.merge(staff_df, student_df, how='left', left_index=True, right_index=True)
print (df)
   B  C Name_x    A    D Name_y
d  4  a      a  4.0    a      a
d  3  d      b  4.0    a      a
e  5  s      b  NaN  NaN    NaN

df = pd.merge(staff_df, student_df, how='left', on='Name')
print (df)
   B  C Name  A  D
0  4  a    a  4  a
1  5  s    b  5  s
2  3  d    b  5  s
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

If student_df is

>>> student_df = pd.DataFrame({'name': ['x','y','z'], 'id': [1, 2, 3]})
>>> student_df

    id name
0   1    x
1   2    y
2   3    z

And staff_df is something like

>>> staff_df = pd.DataFrame({'name': ['a','b','z'], 'id': [34, 35, 36]})
>>> staff_df

    id name
0   34    a
1   35    b
2   36    z

left_index=True, right_index=True will merge 0th row of student_df with 0th row staff_df. For example this will be output of that

     id_x name_x  id_y name_y
0     1      x    34      a
1     2      y    35      b
2     3      z    36      z

It can also be that you're using index from left dataframe and refrencing to id of right dataframe

>>> pd.merge(student_df, staff_df, how='left',right_index=True,left_on='id')
   id_x name_x  id_y name_y
0     1      x  35.0      b
1     2      y  36.0      z
2     3      z   NaN    NaN

But when you use on in both right and left, then those column values for rows are matched and merged on those,(quite to similar to using join columns).

harshil9968
  • 3,254
  • 1
  • 16
  • 26