-1

I could not understand difference between inner_join and semi_join? could you provide me with examples?

According to R

  • semi_join() return all rows from x with a match in y.
  • inner_join() only keeps observations from x that have a matching key in y.
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294

1 Answers1

1

The rows from x returned by semi_join() and inner_join() are the same. The difference is that inner_join will add columns present in y but not present in x, but a semi_join will not add any columns from y.

x = data.frame(a = 1:3)
y = data.frame(a = 2:4, b = 10:12)

## with an inner join, the `b` column is part of the result
inner_join(x, y)
# Joining, by = "a"
#   a  b
# 1 2 10
# 2 3 11

## with a semi join, the `b` column is not part of the result
## because it is not part of `x`
semi_join(x, y)
# Joining, by = "a"
#   a
# 1 2
# 2 3

The joins documented together as "mutating joins", which are described at ?inner_join as

mutating joins add columns from y to x, matching rows based on the key

Compare to the "filtering joins" documented together at ?semi_join

Filtering joins filter rows from x based on the presence or absence of matches in y

Filtering joins only filter x, they do not add columns from y.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294