How to subset monet.frame with %in% properly

Question

It seems everything is ok if

subset(mdf, id %in% c("A","B"))

but error if

ids = c("A","B")
subset(mdf,id %in% ids)

The following is demo codes:

con1 = dbConnect(dbDriver("MonetDB"),"monetdb://go:50000/voc")
d = data.frame(id=base::sample(c("A","B","C","D"),100,replace=T),v=sample(1:10,100,replace=T),stringsAsFactors=F)
head(d)
str(d)

dbWriteTable(con1, "test", d)

mdf <- monet.frame(con1,"test")
subset(mdf, id %in% c("A","B"))

ids = c("A","B")
subset(mdf,id %in% ids)

MonetDB.R_0.8.0 DBI_0.2-7

R version 3.0.2 (2013-09-25) Platform: x86_64-pc-linux-gnu (64-bit)

The subset(mdf, id %in% c("A","B")) actually translate to sql as:

MonetDB-backed data.frame surrogate
2 columns, 44 rows
Query: SELECT * FROM test WHERE ( (id IN ('A','B')) ) 
Columns: id (character), v (numeric)

The error message for IDS = c("A","B") subset(mdf,id %in% IDS)

is something like:

Error in .local(conn, statement, ...) : 
  Unable to execute statement 'SELECT COUNT(*) FROM test WHERE ( (id IN 'AB') ) '.
Server says 'syntax error, unexpected STRING, expecting '(' in: "select count(*) from test where ( (id in 'AB'"' [#42000].

I guess it is a MonetDB.R specific issue. Just don't know how to circumvent it.

Thanks.

As a first step, this looks to be specific to the `MonetDB.R` package's `monet.frame` object. I can't replicate the error using a standard `data.frame`. Also, why not just use `mdf[mdf$id %in% ids,]` and avoid the known issues with `subset`? — thelatemail, Nov 26 '13 at 23:24
`ids` might be the name of another column of your object or something in its environment. Try to use `IDS <- c("A", "B")` instead just to confirm. The known shortcomings of `subset` @thelatemail is referring to are explained here: http://stackoverflow.com/q/9860090/1201032 — flodel, Nov 26 '13 at 23:57
Thanks @thelatemail. Yes it is specific to the MonetDB.R because monet.frame is defined by MonetDB.R. `mdf[mdf$id %in% ids,]` still returns error in the case. The reason I did not use base R data frames tricks here is that the underlying data is at least millions of rows. MonetDB.R could efficiently retrieve the relevant final result very fast and with small memory footprint in R. — wind, Nov 27 '13 at 00:00
Thanks @flodel. Tried `IDS<-c("A“，”B“)`. Still errors. I will edit the original post to add more error info. Its the package author's suggestion that "If you have any questions, please do so on stackoverflow using both the monetdb and r tags. " — wind, Nov 27 '13 at 00:04
Looks like the generated SQL is incorrect. The DB is expecting something else, though I'm not quite sure what. I notice in the second case that it is `SELECT COUNT(*)` instead of just `SELECT *` and that the `IN ('A','B')` has become `IN 'AB'`. The translation seems to be falling over. — thelatemail, Nov 27 '13 at 00:16

score 3 · Accepted Answer · answered Nov 27 '13 at 07:45

3

First of all, thanks for the very good bug report that was generated by cooperation here. I had encountered this issue sometime before, it should be fixed in version 0.8.1 of the package that is available on R-Forge (https://r-forge.r-project.org/R/?group_id=1534).

answered Nov 27 '13 at 07:45

Hannes Mühleisen

2,542
11
13

Thanks @Hannes. The 0.8.1 makes subsetting easy. And thank you very much for MonetDB.R with which R users could easily have a database back end even faster than kdb+. (speed comparing according to my own experience only). – wind Nov 27 '13 at 20:00

How to subset monet.frame with %in% properly

1 Answers1