4

I'm working with SQL and user input. So I'm using the glue library to handle with parameterized queries.

However, to keep things clean, I wrapped that all in a function:

safeQuery <- function(con, sql, ...) {  
  sql = glue_sql(sql, ..., .con=con)
  query <- dbSendQuery(con, sql)
  out <- dbFetch(query)
  dbClearResult(query)
  return(out)
}

So I just call that function with the connection, the SQL code and a list of parameters for that SQL code which glue_sql binds appropriately.

This works just fine.

Now, I have a specific SQL call which I use in one way or another quite often, but with varying parameters.

So I decided to create a function for that:

get_data <- function(con, params) {
  safeQuery(con,
            "SELECT *
             FROM foo
             WHERE bar IN ({vars*})",
            vars=params)
}
p = c(1, 2)
get_data(con, p) 

So, the user data (in this case c(1, 2)) would be passed to get_data, which would pass it along with the SQL call to safeQuery, where glue_sql will take care of the bindings.

However, if I actually try to run get_data, I get an error

object 'params' not found

Googling and SO'ing has made it clear this has to do with R's lazy evaluation.

And indeed, changing get_data to

get_data <- function(con, params) {
  do.call("safeQuery",
          list(con,
               "SELECT *
                FROM foo
                WHERE bar IN ({vars*})",
               vars=params)
}

(as recommended by this answer) works just fine, since do.call evaluates the parameters in the list before sending them to safeQuery.

What I don't understand is why this is necessary in the first place. After all, the value of params isn't modified at any step along its way to glue_sql, so it should still be available.

The linked answer discusses the use of substitute (I also read this R-bloggers post on the subject) to replace the argument's name with the caller's name for it (or with its actual value if the argument value was given directly), but that didn't work in my case. Modifying get_data to use substitute:

get_data <- function(con, params) {
  do.call("safeQuery",
          list(con,
               "SELECT *
                FROM foo
                WHERE bar IN ({vars*})",
               vars=substitute(params))
}

resulted in the following SQL from glue_sql:

SELECT *
FROM foo
WHERE bar IN (params)

instead of the actual values of params. I couldn't try the same within safeQuery since the parameters are hidden in ... and substitute(...) doesn't work. I tried.

I've also tried calling force(params) at the start of get_data, but that gave the same object not found error.

get_data <- function(con, params) {
  force(params)
  do.call("safeQuery",
          list(con,
               "SELECT *
                FROM foo
                WHERE bar IN ({vars*})",
               vars=params)
}

So, why does params get "lost" with the standard calls? Why does do.call work, but not force(params)? Is it possible to make this work using standard evaluation?

And I'm not going to lie: this experience has confused me about how to write my functions and handle their parameters (I'm considering only using do.call from now on). If tips can be given without over-extending the scope of this question, I'd be much obliged.

Zoe
  • 27,060
  • 21
  • 118
  • 148
Wasabi
  • 2,879
  • 3
  • 26
  • 48
  • 1
    You never show how you call `get_data`, perhaps that might help me to understand what is going wrong. Can you include that? – r2evans Jul 31 '19 at 23:14
  • @r2evans oops. Edited the question with an example of the call to get_data right after its first definition. – Wasabi Jul 31 '19 at 23:19
  • 1
    It works for me if your first line of `safeQuery` is instead `do.call(glue_sql, c(sql, list(...), .con=con))`. I'm not sure why (interesting question), but it works. (I recognize this is just a different view of your paragraph *"changing get_data to..."*.) – r2evans Jul 31 '19 at 23:27

1 Answers1

1

I'm not entirely clear why this works, but it does.

safeQuery <- function(con, sql, ...) {
  dots  = list(...)
  dots
}

Then when you call get_data("foo_con", params = 1:3), you get:

$`vars`
[1] 1 2 3

So now we have the argument in a named list, which means you should use glue_data (or glue_data_sql):

safeQuery <- function(con, sql, ...) {
  dots  = list(...)
  glue_data_sql(.x = dots, sql, .con=con)
  # More code...
}

Now when you call get_data("foo_con", params = 1:3), you get:

<SQL> SELECT *
FROM foo
WHERE bar IN (1, 2, 3)

Alternate version:

The problem is about which environment the arguments are being evaluated in. One way to force this is to pass along the environment you want:

safeQuery <- function(con, sql, ..., .envir = parent.frame()) {
  dots  = list(...)
  glue_sql(sql, ..., .con=con, .envir = .envir)
  # More code...
}

get_data <- function(con, params) {

  env <- environment()

  safeQuery(con,
            "SELECT *
            FROM foo
            WHERE bar IN ({vars*})",
            vars=params, .envir = env)
}

get_data("foo_con", params = 1:3)
<SQL> SELECT *
FROM foo
WHERE bar IN (1, 2, 3)
Brian
  • 7,900
  • 1
  • 27
  • 41
  • 1
    I don't have access to my computer right now, so can't check this myself. Looking at that last chunk, though, that doesn't look right. `glue_sql` should output a single SQL call which ends with `IN (1, 2, 3)`. – Wasabi Aug 01 '19 at 01:26
  • @Wasabi you're right, I transposed the `*` when I was fiddling with the `get_data` function. It returns the correct output now. – Brian Aug 01 '19 at 01:49