1

I am passing a query string to a connection and the result should look like this:

select game_name, month, count(*) as count
  from device_metrics.mtu_events
 where YEAR = '2019' and month between '07' and '09'
 group by game_name, month
 order by game_name asc, month asc

This works fine if I pass the above block as a single string into DBI::dbGetQuery(con, myquery)

But the dates are a variable within a shiny app, so I tried to create a function to generate the query string:

my_query <- function(start_date, end_date) {
        yr <- year(ymd(start_date))
        month_start <- month(ymd(start_date))
        month_end <- month(ymd(end_date))

        query <- paste0(
            "select game_name, month, count(*) as count
   from device_metrics.mtu_events
   where YEAR = ", yr, " and month between ", month_start, " and ", month_end, 
            " group by game_name, month
 order by game_name asc, month asc")

        return(query)
    }

When I call this function and try to query our database with it I get:

An error has been thrown from the AWS Athena client. Athena Error No: 372, HTTP Response Code: 1, Error Message: SYNTAX_ERROR: line 3:15: '=' cannot be applied to varchar, integer

Is there a 'right' way to do this? How can I construct a query string with variables and pass then to DBI::dbGetQuery()

Doug Fir
  • 19,971
  • 47
  • 169
  • 299
  • 2
    I guess you need `where YEAR = '", yr, "' and month between '", month_start, "' and '", month_end, "' group_by grame_name, month` because the single quotes around the 'yr', 'month_start', 'month_end' are not in the `paste0` – akrun Jan 25 '20 at 00:22
  • Would that be the equivilent as e.g. this further up `month_start <- month(ymd(start_date)) %>% str()` – Doug Fir Jan 25 '20 at 00:25
  • I guess the `month(Sys.Date())` returns numeric Try `my_query <- function(start_date, end_date) { yr <- year(ymd(start_date)) month_start <- month(ymd(start_date)) month_end <- month(ymd(end_date)) query <- paste0( "select game_name, month, count(*) as count from device_metrics.mtu_events where YEAR = '", yr, "' and month between '", month_start, "' and '", month_end, "' group by game_name, month order by game_name asc, month asc") return(query) }` – akrun Jan 25 '20 at 00:26
  • Have you looked at `glue`/`stringr::str_glue`? – camille Jan 25 '20 at 00:29
  • 2
    Frankly, I would prefer to use `DBI::dbBind` with parameterized queries as opposed to doing the quoting and such (https://bobby-tables.com/). (I'm not familiar with Athena's dialect, not sure how well it's supported.) – r2evans Jan 25 '20 at 00:30
  • If you have to do string-substitution like this, though, I would **much** prefer/recommend the use of `sQuote` for string literals, as that handles mid-string quotes (whereas using the literal `'` does not). – r2evans Jan 25 '20 at 00:31
  • 1
    Some ideas on parameterized SQL queries [here](https://stackoverflow.com/q/26754346/5325862) and [here](https://stackoverflow.com/q/16178640/5325862) – camille Jan 25 '20 at 00:36

2 Answers2

3

Here are two options where we can quote (') the input as a string as month, year functions return numeric values

my_query <- function(start_date, end_date) {
        yr <- year(ymd(start_date))
        month_start <- month(ymd(start_date))
        month_end <- month(ymd(end_date))

        query <- paste0(
            "select game_name, month, count(*) as count
   from device_metrics.mtu_events
   where YEAR = '", yr, "' and month between '", month_start, "' and '", month_end, 
            "' group by game_name, month
 order by game_name asc, month asc")

        return(query)
    }

Using sprintf

my_query <- function(start_date, end_date) {
            yr <- year(ymd(start_date))
            month_start <- month(ymd(start_date))
            month_end <- month(ymd(end_date))

            query <- sprintf("select game_name, month, count(*) as count
       from device_metrics.mtu_events
       where YEAR = '%d'  and month between '%02d' and '%02d' group by game_name, month
     order by game_name asc, month asc", yr, month_start, month_end)

            return(query)
        }
akrun
  • 874,273
  • 37
  • 540
  • 662
2

We can use fn$ from gsubfn to perform string interpolation. We will use NULL instead of an actual connection and fn$list instead of fn$dbGetQuery for purposes of reproducibility but you can replace both. Do not use underscores or dots in names when doing this.

library(gsubfn)

yr <- 2019
monthStart <- '07'
monthEnd <- '09'
con <- NULL

fn$list(con, "select game_name, month, count(*) as count
  from device_metrics.mtu_events
 where YEAR = '$yr' and month between '$monthStart' and '$monthEnd'
 group by game_name, month
 order by game_name asc, month asc")

It could alternately be done in two steps:

query <- fn$c("select game_name, month, count(*) as count
  from device_metrics.mtu_events
 where YEAR = '$yr' and month between '$monthStart' and '$monthEnd'
 group by game_name, month
 order by game_name asc, month asc")
dbGetQuery(con, query)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341