2

I have a list of IDs in an R vector.

IDlist <- c(23, 232, 434, 35445)

I would like to write an RODBC sqlQuery with a clause stating something like

WHERE idname IN IDlist

Do I have to read the whole table and then merge it to the idList vector within R? Or how can I provide these values to the RODBC statement, so recover only the records I'm interested in?

Note: As the list is quite long, pasting individual values into the SQL statement, as in the answer below, won't do it.

dmvianna
  • 15,088
  • 18
  • 77
  • 106
  • Can you use `CREATE TABLE` and `INSERT INTO` statements? An option would be to create a table and insert values at intervals, say 100 at a time. Then use that table in your `WHERE` statement. – Blue Magister Oct 09 '12 at 03:40
  • I don't have write access to the database. I could create a temporary table, yes, but that would kind of defeat the purpose of doing it within R altogether. I might as well save the file and run the SQL code elsewhere, like with Python code. – dmvianna Oct 09 '12 at 04:44

2 Answers2

8

You could always construct the statement using paste

IDlist <- c(23, 232, 434, 35445)
paste("WHERE idname IN (", paste(IDlist, collapse = ", "), ")")
#[1] "WHERE idname IN ( 23, 232, 434, 35445 )"

Clearly you would need to add more to this to construct your exact statement

Dason
  • 60,663
  • 9
  • 131
  • 148
  • [SQL Server]Internal Query Processor Error: The query processor ran out of stack space during query optimization." No, pasting won't do. Any alternatives to me creating a temporary table within the SQL query? – dmvianna Oct 09 '12 at 02:57
  • That sounds like a different issue. I answered the question at hand. I would suggest asking a new question for the issue you're having since that's more of a SQL question. – Dason Oct 09 '12 at 03:17
  • 1
    I disagree. That is why I need a better solution than using paste. I have a long vector that won't fit in an SQL query, but I still need to get RODBC to use it as an SQL table, so it can match the values. I would be happy if you or someone else just told me "RODBC can't do it". If you're not sure, I'm happy to wait for someone else to come up with an answer. – dmvianna Oct 09 '12 at 03:20
  • 2
    @dmvianna See [this MS article](http://support.microsoft.com/kb/288095): "Rewrite the query and use a #temp table to contain the values in the IN list instead of using an IN clause." I suspect you will run into the same problem no matter what language you attempt this in. – Blue Magister Oct 09 '12 at 04:55
0

I put together a solution to a similar problem by combining the tips here and here and running in batches. Approximate code follows (retyped from an isolated machine):

#assuming you have a list of IDs you want to match in vIDs and an RODBC connection in mycon

#queries that don't change
q_create_tmp <- "create table #tmptbl (ID int)"
q_get_records <- "select * from mastertbl as X join #tmptbl as Y on (X.ID = Y.ID)"
q_del_tmp <- "drop table #tmptbl"

#initialize counters and storage
start_row <- 1
batch_size <- 1000
allresults <- data.frame()

while(start_row <= length(vIDs) {
    end_row <- min(length(vIDs), start_row+batch_size-1)
    q_fill_tmp <- sprintf("insert into #tmptbl (ID) values %s", paste(sprintf("(%d)", vIDs[start_row:end_row]), collapse=","))

    q_all <- list(q_create_tmp, q_fill_tmp, q_get_records, q_del_tmp)
    sqlOutput <- lapply(q_all, function(x) sqlQuery(mycon, x))

    allresults <- rbind(allresults, sqlOutput[[3]])
    start_row <- end_row + 1
}
Community
  • 1
  • 1