-1

I have a disagreement with a colleague over the below two answers so need a third opinion.

Suppose you have 2 data frames: Salary and Employee.

Question: Which command would you use to join Employee and Salary by matching the rows from Salary to Employee?

Employee %>% left_join(Salary, by=c("F_NAME"="NAME")) 

or

Employee %>% right_join(Salary, by=c("F_NAME"="NAME")) 
markus
  • 25,843
  • 5
  • 39
  • 58
analyst045
  • 628
  • 7
  • 21
  • Hi. You don't say what you want to accomplish in full detail so how can we know the answer? From your use of "join" in the text one would expect you want an inner join, not a (left or right) (or full?) outer join. Please read & act on [mcve]. But the answer to this question is trivially found, please don't ask unresearched or duplicate questions. See [ask] & the downvote arrow mouseover text. – philipxy Sep 08 '18 at 19:06

2 Answers2

0

Both of these commands will work, assuming that Employee$F_NAME and Salary$NAME contain matching items. The difference is in how rows that do not have matches are handled.

left_join will retain all rows in Employee. For rows that are in Employee but not Salary, any columns unique to Salary will be filled with NA.

right_join will retain all rows in Salary. For rows that are in Salary but not Employee, any columns unique to Employee will be filled with NA.

inner_join will retain only rows that are matched in both Salary and Employee. All others are dropped.

full_join will retain all rows from both data frames. Any rows that are not matched will have their missing left- or right-side columns filled with NA.

See also: some very nice illustrations about join types.

jdobres
  • 11,339
  • 1
  • 17
  • 37
0

This is actually more specifically related to dplyr as opposed to the native R merge. When you use

Employee %>% left_join(Salary, by=c("F_NAME"="NAME")) 

you are concatenating the rows in Employee with all columns from Employee and Salary. Missing values will be given NA. Similarly,

Employee %>% right_join(Salary, by=c("F_NAME"="NAME"))

will yield all rows in Salary with all columns from both data frames.

I think your question may be more related to a full_join, but here is a good place to get familiar with the methods.

prince.e
  • 236
  • 2
  • 4