18

I am new to R but am interested in using Shiny to create dynamic charts using data stored in a SQL Server database. To enable interactivity, I want to bring in the raw data from the database and perform calculations within R rather than have the database summarize the data.

I am able to connect to the database using RODBC, execute a query, and receive results in a data.frame. However, the read time in R is about 12x longer than than the same query executed in SQL Server Management Studio (SSMS). SSMS takes ~600 ms, whereas R takes about 7.6 seconds. My question is whether I am doing something wrong, or is R just really slow with database access? And if so, are there faster alternatives (e.g. writing the database output to a file and reading the file)?

Some information about the query that may help: The query retrieves about 250K rows with 4 columns. The first column is a date and the other three are numeric values. The machine running R and SSMS is a high-end Win 7 workstation with 32GB of memory. The R command that I am running is:

system.time(df <- sqlQuery(cn, query))

which returns:

user  system elapsed
7.17   0.01   7.58

Interestingly, it appears that the data transfer from SQL to my machine is fast, but that R is busy doing things internally for several seconds before returning the data.frame. I see this because network utilization spikes in the first second and almost immediately returns to near 0. Then several seconds later, the R data.frame returns.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Jayhawk
  • 183
  • 1
  • 1
  • 6
  • Pretty sure it's just that slow, we evaluated it a while back and decided to go another route then using RODBC, it's just way to slow except for very simple queries. – Hansi Jun 19 '15 at 18:25

2 Answers2

15

I would try RJDBC http://cran.r-project.org/web/packages/RJDBC/RJDBC.pdf

with these drivers https://msdn.microsoft.com/en-us/sqlserver/aa937724.aspx

library(RJDBC)
drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver","/sqljdbc4.jar") 
con <- dbConnect(drv, "jdbc:sqlserver://server.location", "username", "password")
dbGetQuery(con, "select column_name from table")
Ken Yeoh
  • 876
  • 6
  • 11
  • 2
    Thank you! This absolutely solved the problem. Elapsed time dropped to 0.84 seconds as a result. – Jayhawk Jun 19 '15 at 19:55
  • Can we use RJDBC in windows to speed up the process? – Rishi Oct 02 '18 at 19:29
  • Yes. JDBC (Java Database Connectivity) can be used on OSX/Linux/Windows – Ken Yeoh Oct 12 '18 at 22:33
  • 3
    JDBC(driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver", classPath = "~/HTS/Data/R/Functions/Driver/sqljdbc41.jar") Error in .jfindClass(as.character(driverClass)[1]) : class not found Any idea what is causing the problem ? – Dorian Grv Dec 10 '18 at 16:52
  • Im having issues trying to use RJDBC as well. https://stackoverflow.com/questions/71578602/how-to-connect-to-sql-server-via-rjdbc – Andres Mora Apr 04 '22 at 12:42
2

I would make sure that your R timezone - Sys.setenv(TZ='GMT') set to GMT for example - is same as the time zone of the SQL server from where you are pulling data. It could be that the date column is taking a long time to be interpreted especially if it has a timestamp.

RJDBC will run quicker because it converts date to character and everything else to numeric. RODBC will try to preserve the data type of the SQL table.

Alex
  • 15,186
  • 15
  • 73
  • 127
charliealpha
  • 307
  • 2
  • 12