0

I am trying to use a parametrized query multiple times with package:ROracle.

library(DBI)
conn <- dbConnect(ROracle::Oracle(), ...)

statement <- "select * from bigtable where name = :1"
name <- c("Bob", "Alice", "Carol")

I expected the query to be executed for each row of the data frame (i.e. as for to the param argument in DBI::dbGetQuery()), but it is not the case:

dbGetQuery(conn, statement, data.frame(name))
# Error in .oci.GetQuery(conn, statement, data = data, prefetch = prefetch,  : 
#   bind data has too many rows

I can loop over the names, but I think it may be less efficient (I am new to RDBMS but I read about optimization/caching):

lapply(name, function(i) dbGetQuery(conn, statement, data.frame(i)))

How can it be done efficiently?

There are some related questions that does not solve the problem:

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Thomas
  • 457
  • 2
  • 12

1 Answers1

1

I think there are a couple of ways to approach this.

  1. Insert (to a temp table), single-query on a join:

    mynames <- data.frame(name = c("Bob", "Alice", "Carol"))
    dbWriteTable(con, "mytemp", mynames)
    alldat <- dbGetQuery(con, "
      select bt.*
      from bigtable bt
        inner join mytemp my on bt.name=my.name")
    

    This method has the advantage of being a single query. If you need it split by name later, you can always use R's split.

  2. (Apparently this doesn't work in ROracle?) Create a stored-statement and repeatedly bind/fetch:

    res <- dbSendStatement(con, "select * from bigtable where name = :1")
    eachdat <- lapply(name, function(nm) { dbBind(res, list(nm)); dbFetch(res); })
    dbClearResult(res)
    

    I do not tend to use this method, though that's as much preference as anything else.

  3. Use SELECT ... NAME IN (...)

    statement <- paste0("select * from bigtable where name in (", paste(paste0(":", seq_along(name)), collapse = ","), ")")
    alldat <- dbGetQuery(con, statement, list(name))
    

    The paste(...) in the statement produces ":1,:2,:3" (counting along the length of name), which should be compatible with your original query mechanism.

Having said that ... every server has its own optimization methods, but I suspect that with your lapply(.., \(z) dbGetQuery(..)), your server will cache the optimized version since your query does not change between iterations. A DBA more familiar with Oracle might be able to clarify or refute my claim, but I have a hard time believing that it does not cache the query for some time (in a sql statement cache).

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thank you very much for your answer. However, i) I can't use the temp table/join approach because the real table is extremely big and my query would get killed by the DBA. They want us to run parametrized queries multiple times. ii) The second method does not seems to be implemented in `ROracle`. I get the following error with `ROracle::dbSendStatement`: `Error in .oci.SendQuery(conn, statement, data = data, prefetch = prefetch,: bind data does not match bind specification`. – Thomas Jan 24 '23 at 14:20
  • You mean that the `name` data you have is huge? Or that `bigtable` is huge and they want you to query it iteratively? – r2evans Jan 24 '23 at 14:21
  • Yes, `bigtable` is huge and they want us to query it iteratively? – Thomas Jan 24 '23 at 14:25
  • I find it hard to believe that the DBAs would prefer that you hit a big table _repeatedly_ when you can get it all done efficiently in a single query. If they don't want you to use temp tables (a counter-productive "precaution" in my experience, but I see it periodically), then consider my (newly added) 3rd option above, still getting it all done in one query. – r2evans Jan 24 '23 at 14:33
  • Thanks again for your effort! I'm sorry, but your third proposition won't work either. I already tried it and I have observed a dramatic drop in performance... I have maybe oversimplified my example. My real use-case is querying the French nationwide health claim database. Hundreds researchers have access to the database for research purpose. We have to perform our queries month by month otherwise (i) we have dramatic drop in performance and (ii) the DBA may kill our queries. – Thomas Jan 24 '23 at 14:45
  • It seems your only way out is your third original code block, `lapply(name, function(i) dbGetQuery(conn, statement, data.frame(i)))`. I don't think you're going to lose much (if any) efficiency due to statement optimization/cache, so if that's the only way the DBAs allow you to query that data, it's all you have. (I query large datasets every day and would _cringe_ if that were my only method.) Good luck! – r2evans Jan 24 '23 at 14:47
  • Ok, thank you very much for your patience and all your advices! BTW I contacted the ROracle maintainer about what we discussed to be a "significant under-implementation on the DBI specifications in ROracle" in [my other question](https://stackoverflow.com/a/75222037/11148823). – Thomas Jan 24 '23 at 14:53
  • Awesome, thank you for reaching out. I'm not an author/maintainer of DBI/`odbc`, and I don't access any Oracle instances, but I use them often enough and have spoken at length with the authors to know that I want things to be as stable and consistent across the DBI drivers as possible ... it's odd/frustrating that it has taken so long for some drivers to implement some of what I call "base requirements" (though not strictly 'required'). Thanks! – r2evans Jan 24 '23 at 15:37
  • 1
    The main point of my email was: "Several features that have been in the DBI 'specifications' for well before the current ROracle was released in 2021 currently seem not implemented in ROracle." I just got an answer from `ROracle` maintainer few hours later: "Hi, Thanks for your email, we are working on them. It is a priority for us. Thanks". Wait and see... – Thomas Jan 24 '23 at 16:49