8

Is there a way to make dplyr hooked up to a database pipe data to a new table within that database, never downloading the data locally?

I'd like to do something along the lines of:

tbl(con, "mytable") %>%
   group_by(dt) %>%
   tally() %>%
   write_to(name = "mytable_2", schema = "transformed")
jenswirf
  • 7,087
  • 11
  • 45
  • 65
  • 4
    This reminds me of the time I wrote a Python CGI script that called R that called a database that called a UDF written in R. Why not just write the SQL? – Hong Ooi Apr 26 '15 at 13:30
  • 12
    The **dplyr** syntax is arguably *much* nicer and readable than SQL - especially for when your query would be very long and have subqueries. – jenswirf Apr 26 '15 at 13:37
  • 2
    If you're going to be doing this a lot, I'd suggest you bite the bullet and learn SQL. It'll pay off in the long run. – Hong Ooi Apr 26 '15 at 13:41

1 Answers1

8

While I whole heartedly agree with the suggestion to learn SQL, you can take advantage of the fact that dplyr doesn't pull data until it absolutely has to and build the query using dplyr, add the TO TABLE clause, and then run the SQL statement using dplyr::do(), as in:

# CREATE A DATABASE WITH A 'FLIGHTS' TABLE
library(RSQLite)
library(dplyr)
library(nycflights13)
my_db <- src_sqlite("~/my_db.sqlite3", create = T)
flights_sqlite <- copy_to(my_db, flights, temporary = FALSE, indexes = list(
  c("year", "month", "day"), "carrier", "tailnum"))

# BUILD A QUERY
QUERY = filter(flights_sqlite, year == 2013, month == 1, day == 1) %>%
    select( year, month, day, carrier, dep_delay, air_time, distance) %>%
    mutate( speed = distance / air_time * 60) %>%
    arrange( year, month, day, carrier)

# ADD THE "TO TABLE" CLAUSE AND EXECUTE THE QUERY 
do(paste(unclass(QUERY$query$sql), "TO TABLE foo"))

You could even write a little functoin that does this:

to_table  <- function(qry,tbl)
    dplyr::do(paste(unclass(qry$query$sql), "TO TABLE",tbl))

and pipe the query into that function like so:

filter(flights_sqlite, year == 2013, month == 1, day == 1) %>%
    select( year, month, day, carrier, dep_delay, air_time, distance) %>%
    mutate( speed = distance / air_time * 60) %>%
    arrange( year, month, day, carrier) %>%
    to_table('foo')
Jthorpe
  • 9,756
  • 2
  • 49
  • 64
  • 11
    That's basically what `compute()` does – hadley Apr 26 '15 at 15:37
  • 2
    does not work anymore: `do(paste(unclass(QUERY$query$sql), "TO TABLE foo")) Error in UseMethod("do_") : no applicable method for 'do_' applied to an object of class "character"` – petermeissner Oct 05 '15 at 08:11
  • @hadley where does `compute()` put the table. I don't see it getting generated in the souce db.schema – chrowe Jan 08 '19 at 19:19
  • @chrowe It creates it as a temporary table by default – hadley Jan 08 '19 at 20:41
  • @hadley is there a way to have it story in a regular table instead? – chrowe Jan 10 '19 at 00:44
  • 2
    @chrowe I was going to say read the docs, but I discovered that I forgot to document those functions I just added them to the dev version: https://dbplyr.tidyverse.org/dev/reference/collapse.tbl_sql.html – hadley Jan 10 '19 at 14:47
  • @hadley ok great. that works. So to be clear. This is just executing sql queries, not pulling any data into R? – chrowe Jan 10 '19 at 18:29