1

I have an ODBC connection to SQL server database. From R, I want to query a table with lots of data, but I want to get only those records that match my dataframe in R by certain columns (INNER JOIN). I do currently linking ODBC tables in MS ACCESS 2003 (linked tables "dbo_name") and then doing relational queries, without downloading the entire table. I need to reproduce this process in R avoiding downloading the entire table (avoid SQLFetch ()).

I have read the information from ODBC, DBI, rsqlserver packages without success. Is there any package or way to fix this?

user4157124
  • 2,809
  • 13
  • 27
  • 42
Romm
  • 21
  • 1
  • 8
  • 1
    Use `dbWriteTable` to upload your R `data.frame` to a temporary table in the database, and then use `dbGetQuery` to do the join. – nograpes Mar 06 '14 at 09:24
  • Hi, I've tried without success. The database is owned by the government and do not have permissions to write to her, R displays an error in administration privileges. Thank you very much! – Romm Mar 07 '14 at 11:57
  • Hm, have you tried creating a temporary table (also see my answer below)? – nograpes Mar 08 '14 at 11:50

3 Answers3

2

If you can't write a table to the database, there is another trick you can use. You essentially make a giant WHERE statement. Let's say you want to join table table in the database to your data.frame called a on the column id. You could say:

ids <- paste0(a$id,collapse=',') 
# If a$id is a character, you'll have to surround this in quotes:
# ids <- paste0(paste0("'",a$id,"'"),collapse=',')
dbGetQuery(con, paste0('SELECT * FROM table where id in (',paste(ids,collapse=','),')'))

From your comment, it seems that SQL Server has a problem with a query of that size. I suspect that you may have to "chunk" the query into smaller bits, and then join them all together. Here is an example of splitting the ids into 1000 chunks, querying, and then combining.

id.chunks <- split(a$ids,seq(1000))
result.list <- lapply(id.chunks, function(ids) 
                 dbGetQuery(con, 
                            paste0('SELECT * FROM table where id in (',ids,')')))
combined.resuls <- do.call(rbind,result.list)
nograpes
  • 18,623
  • 1
  • 44
  • 67
  • Hi nograpes!, I made the giant WHERE statement, made by a vector of 395.474 elements.Unfortunately I get the following error: "[SQL Server] insufficient memory to run this query 42000 701 [Microsoft] [SQL Server Native Client 10.0]". I think now the problem is that my computer is 32 bit – Romm Jul 21 '14 at 11:52
  • The memory problem is on the server, and not on your machine - I doubt that having a 32-bit architecture makes any difference. I suspect that the giant WHERE statement is too much for SQL Server. Try splitting your `id`s into chunks, and then combining them, as I describe in the update to my answer. – nograpes Jul 21 '14 at 19:22
  • My suspicion was confirmed on SO: [you cannot make a query of that size](http://stackoverflow.com/questions/1069415/limit-on-the-where-col-in-condition). – nograpes Jul 21 '14 at 19:24
  • nograpes thank you very much!!, your help was very useful. Use your code and I get errors on the server that do not quite understand (I'm new), but I solved it in an unorthodox way I post in response to my question – Romm Jul 24 '14 at 07:25
  • Sorry, there was an error, I had forgot to `paste` together the `ids`. It should work now. – nograpes Jul 24 '14 at 12:32
0

The problem was solved. Ids vector was divided into groups of 1000, and then querying to the server each. I show the unorthodox code. Thanks nograpes!

# "lani1" is the vector with 395.474 ids 
id.chunks<-split(lani1,seq(1000))
        for (i in 1:length(id.chunks)){
           idsi<-paste0(paste0("'",as.vector(unlist(id.chunks[i])),"'"),collapse=',')
             if(i==1){ani<-sqlQuery(riia,paste0('SELECT * FROM T_ANIMALES WHERE an_id IN (',idsi,')'))
                     }
                else {ani1<-sqlQuery(riia,paste0('SELECT * FROM T_ANIMALES WHERE an_id IN (',idsi,')'))
                     ani<-rbind(ani,ani1)
                     }                
                                      }
Romm
  • 21
  • 1
  • 8
0

I adapted the answer above and the following worked for me without needing SQL syntax. The table I used was from the adventureworks SQL Server database.

lazy_dim_customer <- dplyr::tbl(conn, dbplyr::in_schema("dbo", "DimCustomer")) 

# Create data frame of customer ids

adv_customers  <- dplyr::tbl(conn, "DimCustomer")

query1 <- adv_customers %>% 
  filter(CustomerKey < 20000) %>%
  select(CustomerKey)

d00df_customer_keys <- query1 %>% dplyr::collect() 

# Chunk customer ids, filter, collect and bind

id.chunks <- split(d00df_customer_keys$CustomerKey, seq(10))

result.list <- lapply(id.chunks, function(ids)
  lazy_dim_customer %>% 
    filter(CustomerKey %in% ids) %>% 
    select(CustomerKey, FirstName, LastName) %>%
    collect() )

combined.results <- do.call(rbind, result.list)