2

Possible Duplicate:
How to join data frames in R (inner, outer, left, right)?

This question nicely goes over joins in R.

And the Wikipedia article on inner join is useful.

I would like to reproduce this result in base R. I don't think the following work:

merge(employee,department,all=T)
merge(employee,department)

because of the duplicates on the merging variable.

However, sqldf works:

library(sqldf)
sqldf("select * from employee  inner join department ON employee.DepartmentID = department.DepartmentID")

My questions are:

  1. Does this mean that merge does not perform an inner join here.

  2. How can I accomplish this join in base R.

Here are two data.frames to work with from the website and example above.

employee <- structure(list(LastName = c("Rafferty ", "Jones", "Steinberg", 
"Robinson", "Smith", "John"), DepartmentID = c("31", "33", "33", 
"34", " 34", " .")), .Names = c("LastName", "DepartmentID"), class = "data.frame", row.names = c(NA, -6L))

department <- structure(list(DepartmentID = c(31L, 33L, 34L, 35L), DepartmentName =    structure(c(4L,2L, 1L, 3L), .Label = c("Clerical", "Engineering", "Marketing", "Sales"), class = "factor")), .Names = c("DepartmentID", "DepartmentName"), class = "data.frame", row.names = c(NA, -4L))
Community
  • 1
  • 1
julieth
  • 430
  • 4
  • 9
  • Your use of `merge` doesn't match the description of how to do an inner join at the question you link to. – joran Jun 15 '12 at 04:03
  • The sqldf output matches the example from the link. Is there a way to do this with merge? I've read in a couple of places that merge is an inner join so I'm also trying to understand the definition better. Thanks. – julieth Jun 15 '12 at 04:10
  • As I said, compare your code, `merge(employee,department,all = T)` with the instructions for doing an inner join at the question you link to above. You are not doing it as the accepted answer there instructs you to. – joran Jun 15 '12 at 04:12

1 Answers1

3

Julieth,

Please see if this works for you. I inspected the structure of the data structures to see why the results were not what was intended.

str(department)
str(employee)
employee$DepartmentID <- as.numeric(employee$DepartmentID)
merge(employee,department)
joran
  • 169,992
  • 32
  • 429
  • 468
Santosh
  • 192
  • 1
  • 9
  • `all.x = T` does not specify an inner join. As clearly described in the question the OP themselves link to, this will produce a left outer join. (But you're right, the other problem was that one of the ID columns was characters.) – joran Jun 15 '12 at 04:44
  • Thanks. Looks like I messed that one up. As Joran notes, the structures I used were not matched. Not a good showing for my first question. :) – julieth Jun 15 '12 at 04:55