0

I have a data frame that has a column for zip code. I also have a data frame that has a list of zip codes and which Metropolitan Statistical Area they belong to. I'm trying to append the MSA based on zip code. The zip codes in either data frame are not guaranteed to be in the other, and each zip code in the master data may be seen more than once. The data frame starts with 779 rows and should end with 779 rows. I've tried the merge command below

sheet <- merge(sheet, msa, by = "Zip", all.x = TRUE, all.y=FALSE)

However, the resulting data frame 1881 rows.

I've also tried using plyr

test <- join(sheet, msa, by = "Zip")

This also yields a data frame with 1881 rows.

I think I could do with what I want with %in% and a for loop, but I was hoping there was a command that could do what I want.

Thanks in advance for the help.

Mike Edinger
  • 175
  • 1
  • 4
  • 11
  • `each zip code in the master data may be seen more than once` Then it's not surprising that it ends with more rows than it started with. What do you want to happen in the cases where a zip code occurs multiple times in `msa`? – David Robinson Sep 03 '15 at 17:05
  • It sounds like you want a `dplyr::left_join(sheet, msa)`. Try [including at least a little data for easier trouble shooting](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – jeremycg Sep 03 '15 at 17:22
  • @jeremycg `dplyr::left_join` is equivalent to `merge(... all.x = T, all.y = F)` and to `plyr::join()` (which has default `type = "left"`). The issue here is what to do with multiple matches, and 100% agree that there's a need for reproducibility. A 5-10 row example with desired output would make things nice and clear. – Gregor Thomas Sep 03 '15 at 19:12
  • I'm dumb. I didn't realize there were duplicate values in the msa data frame. Once I removed the duplicates there, I had no problems. I'm having other difficulties, and when i make a new post tomorrow, I'll make sure I cobble together a dummy data set. I can't use any of the real thing since it has sensitive info. – Mike Edinger Sep 03 '15 at 20:48

1 Answers1

0

I removed duplicates from the msa column using excel. Once I did that, it solved that particular problem.

Once duplicates were removed, the simple merge command worked:

merge(sheet, msa, by = "Zip", all.x = TRUE, all.y = FALSE)
Mike Edinger
  • 175
  • 1
  • 4
  • 11
  • Could you please provide the code you used to solve the problem so this post might be useful to others in the future. – Rich Scriven Sep 03 '15 at 20:52
  • There wasn't really a command; I removed duplicates in excel and then the normal merge() command did what I wanted. I've updated my solution to reflect that. – Mike Edinger Sep 04 '15 at 13:15