What is difference between
pd.merge(staff_df, student_df, how='left', left_index=True, right_index=True)
and
pd.merge(staff_df, student_df, how='left', left_on='Name', right_on='Name')
What is difference between
pd.merge(staff_df, student_df, how='left', left_index=True, right_index=True)
and
pd.merge(staff_df, student_df, how='left', left_on='Name', right_on='Name')
First merge by indexes in both DataFrames
by left join:
pd.merge(staff_df, student_df, how='left', left_index=True, right_index=True)
And second and simplify third left join by column Name
in both DataFrames
:
pd.merge(staff_df, student_df, how='left', left_on='Name', right_on='Name')
pd.merge(staff_df, student_df, how='left', on='Name')
Samples:
staff_df = pd.DataFrame({'Name':list('abb'),
'B':[4,5,3],
'C':list('asd')}, index=list('ded'))
print (staff_df)
B C Name
d 4 a a
e 5 s b
d 3 d b
student_df = pd.DataFrame({'Name':list('abc'),
'A':[4,5,3],
'D':list('asd')}, index=list('dkl'))
print (student_df)
A D Name
d 4 a a
k 5 s b
l 3 d c
df = pd.merge(staff_df, student_df, how='left', left_index=True, right_index=True)
print (df)
B C Name_x A D Name_y
d 4 a a 4.0 a a
d 3 d b 4.0 a a
e 5 s b NaN NaN NaN
df = pd.merge(staff_df, student_df, how='left', on='Name')
print (df)
B C Name A D
0 4 a a 4 a
1 5 s b 5 s
2 3 d b 5 s
If student_df is
>>> student_df = pd.DataFrame({'name': ['x','y','z'], 'id': [1, 2, 3]})
>>> student_df
id name
0 1 x
1 2 y
2 3 z
And staff_df is something like
>>> staff_df = pd.DataFrame({'name': ['a','b','z'], 'id': [34, 35, 36]})
>>> staff_df
id name
0 34 a
1 35 b
2 36 z
left_index=True, right_index=True will merge 0th row of student_df with 0th row staff_df. For example this will be output of that
id_x name_x id_y name_y
0 1 x 34 a
1 2 y 35 b
2 3 z 36 z
It can also be that you're using index from left dataframe
and refrencing to id
of right dataframe
>>> pd.merge(student_df, staff_df, how='left',right_index=True,left_on='id')
id_x name_x id_y name_y
0 1 x 35.0 b
1 2 y 36.0 z
2 3 z NaN NaN
But when you use on
in both right and left, then those column values for rows are matched and merged on those,(quite to similar to using join columns).