It looks like you are wanting to combine custom SQL code with auto-generated SQL code from dbplyr
. For this it is important to distinguish between:
DBI::db*
commands - that execute the provided SQL on the database and return the result.
dbplyr
translation - where you work with a remote connection to a table
You can only combine these in certain ways. Below I have given several examples depending on your particular use case. All assume that DISTINCT
is a command that is accepted in your specific SQL environment.
Reference examples that cover many of the different use cases
If you'll excuse some self-promotion, I recommend you take a look at my dbplyr_helpers
GitHub repository (here). This includes:
union_all
function that takes in two tables accessed via dbplyr
and outputs a single table using some custom SQL code.
write_to_datebase
function that takes a table accessed via dbplyr
and converts it to code that can be executed via DBI::dbExecute
Automatic piping
dbplyr
automatically pipes your code into the next query for you when you are working with standard dplyr
verbs for which there are SQL translations defined. So long as sql translations are defined you can chain together many pipes (I used 10 or more at once) with the (almost) only disadvantage being that the sql translated query gets difficult for a human to read.
For example, consider the following:
library(dbplyr)
library(dplyr)
tmp_df = data.frame(col1 = c(1,2,3), col2 = c("a","b","c"))
df1 = tbl_lazy(tmp_df, con = simulate_postgres())
df2 = tbl_lazy(tmp_df, con = simulate_postgres())
df = left_join(df1, df2, by = "col1") %>%
distinct()
When you then call show_query(df)
R returns the following auto-generated SQL code:
SELECT DISTINCT *
FROM (
SELECT `LHS`.`col1` AS `col1`, `LHS`.`col2` AS `col2.x`, `RHS`.`col2` AS `col2.y`
FROM `df` AS `LHS`
LEFT JOIN `df` AS `RHS`
ON (`LHS`.`col1` = `RHS`.`col1`)
) `dbplyr_002`
But not as nicely formatted. Note that the initial command (left join) appears as a nested query, with a distinct in the outer query. Hence df
is an R link to a remote database table defined by the above sql query.
Creating custom SQL functions
You can pipe dbplyr
into custom SQL functions. Piping means that the thing being piped becomes the first argument of the receiving function.
custom_distinct <- function(df){
db_connection <- df$src$con
sql_query <- build_sql(con = db_connection,
"SELECT DISTINCT * FROM (\n",
sql_render(df),
") AS nested_tbl"
)
return(tbl(db_connection, sql(sql_query)))
}
df = left_join(df1, df2, by = "col1") %>%
custom_distinct()
When you then call show_query(df)
R should return the following SQL code (I say 'should' because I can not get this working with simulated sql connections), but not as nicely formatted:
SELECT DISTINCT * FROM (
SELECT `LHS`.`col1` AS `col1`, `LHS`.`col2` AS `col2.x`, `RHS`.`col2` AS `col2.y`
FROM `df` AS `LHS`
LEFT JOIN `df` AS `RHS`
ON (`LHS`.`col1` = `RHS`.`col1`)
) nested_tbl
As with the previous example, df
is an R link to a remote database table defined by the above sql query.
Converting dbplyr to DBI
You can take the code from an existing dbplyr
remote table and convert it to a string that can be executed using DBI::db*
.
As another way of writing a distinct query:
df1 = tbl_lazy(tmp_df, con = simulate_postgres())
df2 = tbl_lazy(tmp_df, con = simulate_postgres())
df = left_join(df1, df2, by = "col1")
custom_distinct2 = paste0("SELECT DISTINCT * FROM (",
as.character(sql_render(df)),
") AS nested_table")
local_table = dbGetQuery(db_connection, custom_distinct2)
Which will return a local R dataframe with the equivalent sql command as per the previous examples.