I am sending a simple query to a MySQL database using dbplyr
:
library(dbplyr)
my_conn<-dbConnect(...)
tab<-tbl(my_conn, "myTable")
tab %>% select(id, date, type) %>%
filter(type = 'foobar')
However, if I inspect the generated SQL with show_query()
, I get this:
SELECT *
FROM (SELECT `id`, `date`, `type`
FROM `myTable`) `q01`
WHERE (`type` == 'foobar')
This query is very slow to execute.
If I instead execute the following SQL command on the server:
SELECT id, date, type FROM db.myTable WHERE type = 'foobar'
then the return is nearly instantaneous.
My question is: why is dbplyr
doing SELECT *
(i.e. select all) and then doing a nested select in line 2? Also, why has "q01" appeared in this query? Could this be why my query is very slow to run compared to executing the minimal command directly on the server? I would perhaps expect dbplyr
to create inefficient SQL queries as the complexity of the operations increases but--in this case at least--I can't really write a more succinct set of operations. It's just a select and filter (SELECT and WHERE).