0

I'm usually a SAS user but was wondering if there was a similar way in R to list data that can only be found in one data frame after merging them. In SAS I would have used

data want;
    merge have1 (In=in1) have2 (IN=in2) ;
    if not in2;
run;

to find the entries only in have1. My R code is:

inner <- merge(have1, have2, by= "Date", all.x = TRUE, sort = TRUE)

I've tried setdiff() and antijoin() but neither seem to give me what I want. Additionally, I would like to find a way to do the converse of this. I would like to find the entries in have1 and have2 that have the same "Date" entry and then keep the remaining variables in the 2 data frames. For example, consider have1 with columns "Date", "ShotHeight", "ShotDistance" and have2 with columns "Date", "ThrowHeight", "ThrowDistance" so that the m]new dataframe, call it "new" has columns "Date", ShotHeight", "ShotDistance", "ThrowHeight", "ThrowDistance".

regents
  • 600
  • 6
  • 15
  • I believe the more standard terminology here is anti-join, IIUC. hopefully this can help to continue searching... both data.table and dplyr have clean/natural implementations of this functionality built in – MichaelChirico Apr 15 '18 at 16:13
  • @Renu this is not a dupe because what OP wants is not listed in the options in that question. – Hong Ooi Apr 15 '18 at 16:13
  • Possibly https://stackoverflow.com/questions/28702960/find-complement-of-a-data-frame-anti-join/28703077#28703077 – IceCreamToucan Apr 15 '18 at 16:15
  • `in` is an operator as well as a data step option, but it's technically not a function. – Reeza Apr 15 '18 at 19:18
  • Your SAS code has no `BY` statement. As written, you're doing a line by line merge of have1 and have2. That's different than the R code you've written, which merges two data frames based on the value of the `County` column in each. Is this what you intended? – Len Greski Apr 15 '18 at 19:40

1 Answers1

1

Assuming only one by-variable, the simplest solution is not to merge at all:

want <- subset(have1, !(county %in% have2$county))

This subsets have1 to exclude rows where the value of county is in have2.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187