1

I use sqldf function for data manipuation in R studio. And I met a problem that sqldf function doesn't take into account the condition I set up in my query. For example the code

d2 <- sqldf("select a.'n1.0', a.date, a.tot_cap*100/a.'n1.0' as afasf from data a where a.date < '2013-03-01'") 

gives exact the same result as code

d2 <- sqldf("select a.'n1.0', a.date, a.tot_cap*100/a.'n1.0' as afasf from data a").

Setting single quotes, double quotes, no quotes at all doesn't help.

Does anybody know what's the problem and how to resolve it?

Thanks in advance.

Here is the result of dput(data2) command where data2 is the first 40 rows of the initial dataset.

structure(list(REGN = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), date = structure(c(14761, 
14791, 14822, 14853, 14883, 14914, 14944, 14975, 15006, 15034, 
15065, 15095, 15126, 15156, 15187, 15218, 15248, 15279, 15309, 
15706, 15737, 15765, 15796, 15826, 15857, 15887, 15918, 15949, 
15979, 16010, 16040, 16071, 16102, 16130, 16161, 16191, 16222, 
16252, 16283, 16314), class = "Date"), tot_cap = structure(c(3.29680129097603e-316, 
3.27881038454837e-316, 3.31569035934127e-316, 3.42544269478722e-316, 
3.46449744773999e-316, 3.76857311386686e-316, 3.83375272419446e-316, 
3.89781752479879e-316, 3.88106524094525e-316, 3.91119504385185e-316, 
3.54278081534629e-316, 3.66904833254234e-316, 3.84160896084211e-316, 
3.83063042693901e-316, 3.87586035817945e-316, 3.98758302742101e-316, 
4.04557250040469e-316, 4.19023427922167e-316, 4.27870651560933e-316, 
5.55371435659506e-316, 5.45611929687006e-316, 5.73639651252866e-316, 
5.74625569130456e-316, 5.74576631928235e-316, 5.79667637503333e-316, 
5.86047635645149e-316, 5.66957630781748e-316, 6.16857297583704e-316, 
6.25638237375425e-316, 6.32633446059442e-316, 6.54992194176386e-316, 
6.44650520772079e-316, 6.41496840466797e-316, 6.51734725500882e-316, 
6.52890122716964e-316, 6.49014508749322e-316, 6.47399926921995e-316, 
6.49043322657787e-316, 6.47938176858544e-316, 6.65456810876004e-316
), class = "integer64"), bas_cap = structure(c(0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 5.24913731265079e-316, 5.71723269425777e-316, 
5.71723269425777e-316, 5.72079658738749e-316, 5.71723269425777e-316, 
5.71723269425777e-316, 5.71723269425777e-316, 5.84030513832873e-316
), class = "integer64"), osn_cap = structure(c(0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 5.24913731265079e-316, 5.71723269425777e-316, 
5.71723269425777e-316, 5.72079658738749e-316, 5.71723269425777e-316, 
5.71723269425777e-316, 5.71723269425777e-316, 5.84030513832873e-316
), class = "integer64"), n1.0 = c(NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 14.13, 13.91, 13.74, 13.25, 
13.11, 13.59, 13.07, 13.06), n1.1 = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 11.65, 12.3, 12.11, 11.75, 
11.65, 12.04, 11.61, 11.53), n1.2 = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 11.65, 12.3, 12.12, 11.77, 
11.66, 12.06, 11.62, 11.54), n1 = c(13.96, 13.66, 15.07, 15.75, 
15.49, 15.87, 15.7, 16.32, 16.17, 16.67, 14.97, 15.06, 15.22, 
15.02, 14.68, 14.42, 13.5, 13.29, 13.1, 13.25, 12.53, 13.28, 
13.35, 13.42, 13.67, 14.14, 13.1, 14.13, 14.13, 14.55, 14.53, 
14.51, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, 40L
), class = "data.frame")
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • 3
    Please share your input data, or a representative example, by adding the output of `dput(example_df)` to the question. Otherwise this issue you're asking about is not reproducible – IceCreamToucan Nov 12 '19 at 18:13
  • I would be very thanful to you if you help me to load data to the site. – David Bijoyan Nov 12 '19 at 18:26
  • You can do so by running `dput(example_df)` and copying the ouput, then editing this question to paste the output at the bottom. It is possible this will be too large if you have a big table. In that case you would need to create an example data frame that has the same issue, possibly by subsetting your original data frame. – IceCreamToucan Nov 12 '19 at 18:28
  • 1
    May be you can check [here](https://stackoverflow.com/questions/37665367/r-sqldf-not-being-selective-in-date-range-criteria) Possible duplicate – akrun Nov 12 '19 at 18:30
  • Based on G. Grothendieck's comment [here](https://stackoverflow.com/questions/47933142/filter-by-date-range-in-sqldf) and [here](https://stackoverflow.com/questions/28996390/sqldf-how-to-query-based-on-a-date-condition), the default DB engine for sqldf only sees dates as their numeric representation. So it seems you need to do `a.date < 15765` to get the filter to work. (15765 is `as.numeric(as.Date('2013-03-01'))`) – IceCreamToucan Nov 12 '19 at 18:34
  • The comments you've mentioned helped me to overcome my issues. Thanks at all. – David Bijoyan Nov 12 '19 at 18:51

1 Answers1

2

There are several problems here:

  • single quotes are used for quoting constants. Use double quotes to quote variable names.

  • SQLite does not support a Date type so the dates are read in as numbers (number of days since the UNIX Epoch) and then the statement is attempting to compare that number to a character string.

  • although not wrong you don't really need the alias a since there is only one data frame so there cannot be any confusion as to what is being referenced.

  • the question refers to data2 but data is used in the R code. We have changed the R code to use data2.

See the links given in the comments under the question for more information.

SQLite

Using the default SQLite backend, we preface sqldf with fn$ to cause it to run anything between backticks in R and replace that expression with the result of that run.

d2 <- fn$sqldf("select \"n1.0\", date, tot_cap * 100/ \"n1.0\" as afasf 
  from data2 
  where date < `as.Date('2013-03-01')`") 

H2

Alternately, use the H2 backend. (As with SQLite, the H2 database is included in the R driver so you don't have to install it separately but you do need to ensure that you have Java installed -- fortunately that is extremely simple as it has an automated installation procedure.) That database does have a Date type and is able to compare such dates with correclty formatted character strings representing dates.

library(RH2)  # <----------note

d3 <- fn$sqldf("select \"n1.0\", date, tot_cap * 100 / \"n1.0\" as afasf 
  from data2
  where date < '2013-03-01'") 

Be sure to detach RH2 from the search path if you want to resume using SQLite.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341