3

Is there a simple way to use RSQLite and sqldf with more than 999 variables?

This question was raised also here. An answer on R-sig-DB

"The reason why this doesn't work seems to be SQLITE_MAX_VARIABLE_NUMBER in the sqlite source (in RSQLite) is currently set to 999."

Here is some sample code:

# works fine
widedf <- rnorm(2*998)
dim(widedf) <- c(2, 998)
f2 <- tempfile()
write.csv(widedf, f2, quote = FALSE)

f2 <- file(f2)
widedf <- sqldf("select * from f2", 
                dbname = tempfile(), 
                file.format = list(header = T, row.names = F))

# doesn't work
widedf <- rnorm(2*1000)
dim(widedf) <- c(2, 1000)
f1 <- tempfile()
write.csv(widedf, f1, quote = FALSE)

f1 <- file(f1)
widedf <- sqldf("select * from f1", 
                dbname = tempfile(), 
                file.format = list(header = T, row.names = F))
# Error in try({ : 
#  RS-DBI driver: (RS_sqlite_import: too many SQL variables)

What are the options in this situation?

Ryogi
  • 5,497
  • 5
  • 26
  • 46

1 Answers1

6

sqldf also works with certain other databases too. Try the H2 java database which is supported by sqldf and which has a csvread function. Install java if you don't already have it and then install the RH2 package (which includes H2 itself so there is nothing else to install).

First Approach

library(RH2)
library(sqldf)

# create test file
widedf <- as.data.frame(matrix(rnorm(2*1000), 2))
write.csv(widedf, "widedf.csv", row.names = FALSE)

# fac2num is defined on sqldf home page in FAQ #10
w.out <- sqldf("select * from csvread('widedf.csv')", method = fac2num)

Second Approach

A second alternate method that sets the types within H2 rather than using fac2num follows. The first line reads in just the first row of widedf.csv using read.csv and even drops that so that we get an empty data frame but with the correct column classes. The second line inserts the widedf.csv file into widedf0 and then in the second of the two sql statements selects it back so that the result of the sqldf call is the new table.

widedf0 <- read.csv("widedf.csv", nrows = 1)[0L, ]
w2.out <- sqldf(c("insert into widedf0 (select * from csvread('widedf.csv'))", 
    "select * from widedf0"))

In the above cases, sqldf will notice that RH2 is loaded and assume that you wanted to use H2 as the backend database. (The sqldf package also has a drv argument and a sqldf.driver option either of which can be used to force a particular backend regardless of what is loaded.)

There are also some additional examples of using H2's csvread from sqldf on the sqldf home page in FAQ #10 .

If you really want to do it with the RSQLite package as the backend then you could discuss with the RSQLite package maintainer whether its feasible for the 999 limit to be increased or you could build a private version of RSQLite with a higher limit.

EDIT:

Modified code above to use fac2num and also to illustrate a second method which sets the column types from H2. See FAQ #10 .

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341