How do you do non standard joins ('col1' != 'col2')
in dplyr when you are working with a database.
Example:
Setting up the database:
library(dplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
tableA <- data.frame(col1= c("a","b","c","d"),
col2 = c(1,2,3,4))
copy_to(con, tableA)
Here is the sql join I would like to do using dplyr code when working with the database:
SQL code:
tbl(con, sql("select a.col1, b.col2
from
tableA as a
inner join
tableA as b
on a.col1 <> b.col1")) %>%
arrange(col1, col2)
Results:
# Source: SQL [?? x 2]
# Database: sqlite 3.19.3 [:memory:]
# Ordered by: col1, col2
col1 col2
<chr> <dbl
1 a 2
2 a 3
3 a 4
4 b 1
5 b 3
6 b 4
7 c 1
8 c 2
9 c 4
10 d 1
# ... with more rows
dplyr code:
Here is the initial dplyr code I have attempted when recreating the sql code above:
tbl(con,"tableA")->dbtableA
dbtableA %>%
inner_join(dbtableA, by = c('col1' != 'col1')) %>%
select(col1, col2=col2.x) %>%
arrange(col1, col2)
Error:
by
must be a (named) character vector, list, or NULL for natural joins (not recommended in production code), not logical
When attempting to solve this using tidyr package (code below) I get an error message:
library(tidyr)
dbtableA %>%
expand(col1,col2) %>%
left_join(dbtableA, by = 'col1') %>%
filter(col2.x != col2.y) %>%
select(col1, col2 = col2.x) %>%
arrange(col1, col2)
Error: in UseMethod("expand_") : no applicable method for 'expand_' applied to an object of class "c('tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"
Does anyone know how to code up this join in dplyr code when using a database? Many thanks in advance.