R equivalent to SAS's "In" data set option for including and excluding overlapping data

Question

I'm usually a SAS user but was wondering if there was a similar way in R to list data that can only be found in one data frame after merging them. In SAS I would have used

data want;
    merge have1 (In=in1) have2 (IN=in2) ;
    if not in2;
run;

to find the entries only in have1. My R code is:

inner <- merge(have1, have2, by= "Date", all.x = TRUE, sort = TRUE)

I've tried setdiff() and antijoin() but neither seem to give me what I want. Additionally, I would like to find a way to do the converse of this. I would like to find the entries in have1 and have2 that have the same "Date" entry and then keep the remaining variables in the 2 data frames. For example, consider have1 with columns "Date", "ShotHeight", "ShotDistance" and have2 with columns "Date", "ThrowHeight", "ThrowDistance" so that the m]new dataframe, call it "new" has columns "Date", ShotHeight", "ShotDistance", "ThrowHeight", "ThrowDistance".

I believe the more standard terminology here is anti-join, IIUC. hopefully this can help to continue searching... both data.table and dplyr have clean/natural implementations of this functionality built in — MichaelChirico, Apr 15 '18 at 16:13
@Renu this is not a dupe because what OP wants is not listed in the options in that question. — Hong Ooi, Apr 15 '18 at 16:13
Possibly https://stackoverflow.com/questions/28702960/find-complement-of-a-data-frame-anti-join/28703077#28703077 — IceCreamToucan, Apr 15 '18 at 16:15
`in` is an operator as well as a data step option, but it's technically not a function. — Reeza, Apr 15 '18 at 19:18
Your SAS code has no `BY` statement. As written, you're doing a line by line merge of have1 and have2. That's different than the R code you've written, which merges two data frames based on the value of the `County` column in each. Is this what you intended? — Len Greski, Apr 15 '18 at 19:40

score 1 · Answer 1 · answered Apr 15 '18 at 17:37

1

Assuming only one by-variable, the simplest solution is not to merge at all:

want <- subset(have1, !(county %in% have2$county))

This subsets have1 to exclude rows where the value of county is in have2.

answered Apr 15 '18 at 17:37

Hong Ooi

56,353
13
134
187

R equivalent to SAS's "In" data set option for including and excluding overlapping data

1 Answers1