I have been trying to wrap my head around merge
for a while:
I have the following dataframes:
staff_df = pd.DataFrame([{'Name': 'Kelly', 'Role': 'Director of HR', 'Location': 'State Street'},
{'Name': 'Sally', 'Role': 'Course liasion', 'Location': 'Washington Avenue'},
{'Name': 'James', 'Role': 'Grader', 'Location': 'Washington Avenue'}])
student_df = pd.DataFrame([{'Name': 'James', 'School': 'Business', 'Location': '1024 Billiard Avenue'},
{'Name': 'Mike', 'School': 'Law', 'Location': 'Fraternity House #22'},
{'Name': 'Sally', 'School': 'Engineering', 'Location': '512 Wilson Crescent'}])
I understand that I can merge them in more ways than one:
pd.merge(staff_df, student_df, how='left', left_on='Name', right_on='Name')
pd.merge(student_df, staff_df, how='left', left_on='Name', right_on='Name')
pd.merge(staff_df, student_df, how='right', left_on='Name', right_on='Name')
pd.merge(student_df, staff_df, how='right', left_on='Name', right_on='Name')
Each produces a slightly different output. Can someone guide me on the proper way to understand how each output is constructed?
Specifically,
- Why are the role and school columns always between location_y?
- When is the role column beside the name column and when is the school column beside the name column?
I would hold off asking about using left_index
and right_on
in the same merge statement.
Thanks.