1

I've been working to parameterize a SQL Statement that uses the IN statement in the WHERE clause. I'm using rodbcext library for parameterizing but it seems to lack expansion of a list.

I was hoping to write code such as

sqlExecute("SELECT * FROM table WHERE name IN (?)", c("paul","ringo","john", "george")

I'm using the following code but wondered if there's an easier way.

library(RODBC)
library(RODBCext)

# Search inputs
names <- c("paul", "ringo", "john", "george")

# Build SQL statement
qmarks <- replicate(length(names), "?")
stringmarks <- paste(qmarks, collapse = ",")
sql <- paste("SELECT * FROM tableA WHERE name IN (", stringmarks, ")")
# expand to Columns - seems to be the magic step required
bindnames <- rbind(names)

# Execute SQL statement
dbhandle <- RODBC::odbcDriverConnect(connectionString)
result <- RODBCext::sqlExecute(dbhandle, sql, bindnames, fetch = TRUE)
RODBC::odbcClose(dbhandle)

It works but feel I'm using R to expand the strings in the wrong way (bit new to R - so many ways to do the same thing wrong). Somebody will probably say "that creates factors - never do that" :-)

I found this article which suggest I'm on the right track but it doesn't discuss having to expand the "?" and turn the list into columns of a data.frame

R RODBC putting list of numbers into an IN() statement

Thank you.

UPDATE: As Benjamin shows below - the sqlExecute function can handle a list() of inputs. However upon inspection of the resulting SQL I discovered that it uses cursors to rollup the results. This significantly increases the CPU and I/O over the sample code I show above.

While the library can indeed solve this for you, for large results it may be too expensive. There are two answers and it depends upon your needs.

Community
  • 1
  • 1
ripvlan
  • 460
  • 6
  • 14
  • In PHP i've seen a lot of code that puts variables directly in sql statement. And then if it works why should you care? Not all things are easy to do in R. And this is the first time I see R connecting to a database. – keiv.fly Jul 29 '16 at 18:58
  • Connecting to databases is one of R's primary functions – Carl Jul 29 '16 at 18:59

2 Answers2

1

Since your only parameter in the query is in collection for IN, you could get away with

sqlExecute(dbhandle,
    "SELECT * FROM table WHERE name IN (?)", 
    list(c("paul","ringo","john", "george")),
    fetch = TRUE)

sqlExecute will bind the values in the list to the question mark. Here, it will actually repeat the query four times, once for each value in the vector. It may seem kind of silly to do it this way, but when trying to pass strings, it's a lot safer in many ways to let the binding take care of setting up the appropriate quote structure rather than trying to paste it in yourself. You will generate fewer errors this way and avoid a lot of database security concerns.

Benjamin
  • 16,897
  • 6
  • 45
  • 65
  • I'll try that. When using c() an error was returned about unused "cond" and only the first value was used... but maybe list() is the magic. I found the code on github and didn't see anything obvious. Executing 4 statements might be okay in my case. – ripvlan Jul 30 '16 at 22:23
  • 2
    Strictly speaking, `sqkExecute` wants a data frame. Lists just happen to be easy to convert to data frames. Also, you won't notice that it makes four calls. `sqlExecute` loops over the rows of the data frame, extracts a data frame for each row, and binds them together at the end. From the R side, it will feel like one call, regardless of what happens on the SQL side. – Benjamin Jul 30 '16 at 22:58
  • Yes - thanks. That was a typo when I re-created my sample. I am using a data.frame ... data.frame(names=c("john",....) Which of course didn't work. – ripvlan Aug 01 '16 at 17:27
  • Hi @Benjamin - sorry for the delay, real work got in the way. Thank you for the help and hint using a list(). I used your example and found that it does indeed issue multiple queries! However, the implementation uses cursors to get the job done and the overall cpu/reads/duration is longer than issuing the native IN statement in SQL. In my specific case with a subset of data the overhead is significant (980 vs 296 ms and 5200 vs 126 reads). As my full set of data grows I worry about scalability. Thank you though - learned something about the library I didn't know. – ripvlan Aug 03 '16 at 16:41
0

What if you declare a variable table in a character object and then concatenate with the query.

library(RODBC)
library(RODBCext)

# Search inputs
names <- c("paul", "ringo", "john", "george")

# Build SQL statement
sql_top <- paste0( "SET NOCOUNT ON \r\n DECLARE @LST_NAMES TABLE (ID NVARCHAR(20)) \r\n INSERT INTO @LST_NAMES VALUES ('", paste(names, collapse = "'), ('" ) , "')")

sql_body <- paste("SELECT * FROM tableA WHERE name IN (SELECT id FROM @LST_NAMES)")
sql <- paste0(sql_top, "\r\n", sql_body)

# Execute SQL statement
dbhandle <- RODBC::odbcDriverConnect(connectionString)
result <- RODBCext::sqlExecute(dbhandle, sql, bindnames, fetch = TRUE)
RODBC::odbcClose(dbhandle)

The query will be (the set no count on is important to retrieve the results)

SET NOCOUNT ON 
DECLARE @LST_NAMES TABLE (ID NVARCHAR(20)) 
INSERT INTO @LST_NAMES VALUES ('paul'), ('ringo'), ('john'), ('george')
SELECT * FROM tableA WHERE name IN (SELECT id FROM @LST_NAMES)
  • Thanks for the idea. But it was easier to simple output the IN clause with the values. The temp table doesn’t buy anything for small lists but does add text across the wire to be compiled. I wanted to avoid sqlinjection issues and get a compiled reusable query plan. Generating the temp table code causes a different query to be generated and thus the plan can’t be reused. – ripvlan Feb 22 '19 at 02:45