Filter with specific combination in dplyr

Question

there must be a simple solution to the following problem but I cannot figure it out by myself and my searches were not successful unfortunately:

In a data.frame with multiple columns:

x <- data.frame(id = c(1, 1, 2, 2),
                dat = c("a", "b", "a", "b"),
                val = 1:4)

I want to filter out only those observations that I have in another data.frame:

y <- data.frame(id = 1:2,
                dat = c("a", "b"))

The result should look like this:

> z
  id dat val
1  1   a   1
2  2   b   4

The following approach doesn't lead to the desired result:

> x %>% filter(id %in% y$id & dat %in% y$dat)
  id dat val
1  1   a   1
2  1   b   2
3  2   a   3
4  2   b   4

Can someone please point me into the right direction?

Thanks a lot in advance!

You are looking for `dplyr::semi_join(x,y)` – Frank May 04 '18 at 14:19 — Frank, May 04 '18 at 14:19

score 3 · Accepted Answer · answered May 04 '18 at 13:48

3

We could use an inner_join

library(dplyr)
inner_join(x, y)

Or merge from base R

merge(x, y)

answered May 04 '18 at 13:48

akrun

I tried this earlier but only specified `by = c("id")`. The correct way would be `by = c("id", "dat")`; just as `dplyr` will do automatically if `by` is not specified at all. – Frederick May 04 '18 at 14:18
@Frederick Yes, if there are many columns that are matching, you may need to specify only by the intended columns. Here, it is just that both the columns are matching in both datasets. So, it would automatically pick those – akrun May 04 '18 at 14:19

1 Answers1