13

I'm using the RSQLite package to make queries to a local SQLite database, and for some queries the RSQLite interface is quite slow.

As a specific example, the following query takes under one second to run using the sqlite3 command-line utility:

$ sqlite3 data/svn.db
SQLite version 3.7.5
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select count(distinct svn_path.revision)  FROM src INNER JOIN svn_path ON src.filename=svn_path.path;
5039

But the equivalent query in R takes a little over two minutes and uses 100% of one of my CPUs:

> library(RSQLite)
Loading required package: DBI
> con <- dbConnect(SQLite(), dbname="data/svn.db")
> dbGetQuery(con, "select count(distinct svn_path.revision)  FROM src INNER JOIN svn_path ON src.filename=svn_path.path")
  count(distinct svn_path.revision)
1                              5039

Why is the performance so much slower through the R interface?

Note that I'm using R64 2.10.1 on Mac OS X 10.6.6.

Lorin Hochstein
  • 57,372
  • 31
  • 105
  • 141
  • Are you sure the query wasn't just cached on the server when you ran it via the command line? – Daniel Dickison Feb 28 '11 at 17:47
  • 7
    @Daniel Dickison: SQLite has no server. – Benoit Mar 01 '11 at 12:27
  • 2
    Maybe the RSQLite library does not the same engine version than the shell (3.7.5). – Benoit Mar 01 '11 at 12:28
  • 1
    Whoops, good point. I was thinking of MySQL for some reason. – Daniel Dickison Mar 01 '11 at 20:15
  • 2
    What matters is the version of RSQLite you have. Your version of R seems to be over a year old so if your RSQLite is just as old it could be a much older engine (eg, 3.6.4) as suggested by Benoit. Out of desperation you could try swapping the table order if src and svn_path are very different in size (# of rows) eg `select count(distinct svn_path.revision) FROM svn_path INNER JOIN src ON src.filename=svn_path.path`. If that makes it run faster, they're using different query plans. – Tony Lee Mar 03 '11 at 20:16
  • 1
    Try running EXPLAIN QUERY PLAN for your query both from R and from the command line to see if they differ. – Jouni K. Seppänen Mar 04 '11 at 21:44
  • 2
    @Tony Lee: Upgrading R addressed the problem. If you respond with an answer, I'll accept it. – Lorin Hochstein Mar 05 '11 at 15:22

1 Answers1

3

What matters is the version of RSQLite you have. Your version of R seems to be over a year old so if your RSQLite is just as old it could be a much older engine (eg, 3.6.4) as suggested by Benoit.

Tony Lee
  • 5,622
  • 1
  • 28
  • 45