1

EXISTS/NOT EXISTS are features in SQL which are incredible powerful. Is there a way how I can use R commands or functions to perform what I have with EXISTS in SQL? Example:

# Two tables
x0 <- data.frame(a=rep(1:4,2))
x1 <- data.frame(a=c(2,2,3,5))

I want a new indicator b which gives me 1 if the value in x0 EXISTS in table x1 otherwise 0.

# Initialize b
x0$b <- 0   

# Update b 
sqldf(c("UPDATE x0
         SET b = 1
         WHERE EXISTS (SELECT 1
                       FROM x1
                       WHERE x0.a = x1.a
                      )"
        , "SELECT * FROM main.x0"
        )
      )

Results:

  a b
1 1 0
2 2 1
3 3 1
4 4 0
5 1 0
6 2 1
7 3 1
8 4 0
zx8754
  • 52,746
  • 12
  • 114
  • 209
giordano
  • 2,954
  • 7
  • 35
  • 57
  • 2
    `x0$b <- (x0$a %in% x1$a) + 0L`. Much simpler than SQL. – joran Oct 05 '16 at 15:09
  • Autsch! Yes. Put your answer as response so I can vote it. Two questions a) Why 0L and not only 0? b) `!((x0$a %in% x1$a)) + 0L` returns the logical values and not 0/1. Why? – giordano Oct 05 '16 at 15:48
  • Related post: https://stackoverflow.com/questions/16143445/minus-operation-of-data-frames – zx8754 May 09 '18 at 14:17

1 Answers1

2

Try this:

x0$b <- (x0$a %in% x1$a) + 0L

Using 0L rather than 0 is mostly me just being a bit pedantic; it ensures that everything remains as integers rather than numeric (doubles).

The reason !(x0$a %in% x1$a) + 0L returns the booleans, not the integers is due to operator precedence. The negation is applied last. Try moving it just one spot to the right as the first thing inside the parens: (!x0$a %in% x1$a) + 0L

As @Roland points out, if you find the whole + 0L bit too clever to be clear, you can always just do as.integer(x0$a %in% x1$a).

joran
  • 169,992
  • 32
  • 429
  • 468
  • 1
    As a pedant I would recommend explicit coercion using `as.integer`. – Roland Oct 05 '16 at 15:53
  • 1
    @Roland Yeah, I guess adding `0L` is more like being overly clever in a slightly pedantic manner. – joran Oct 05 '16 at 15:55
  • 1
    All joking aside, I expect that `as.integer` is also faster (although that should rarely be a concern since `+` is fast). You could also use `x0$b <- +(x0$a %in% x1$a)` if you want to code-golf. – Roland Oct 05 '16 at 15:59