2

I'm trying to filter a dataframe based on a date range using sqldf, like the example code below. I have data like the example data below. The datedf dataframe that gets returned by sqldf has no records. There are records in the SHV dataframe in that date range, can anyone see what I'm doing wrong and let me know how to filter by date range in sqldf. Dates are always tricky in r for me.

Code:
datedf<-sqldf("select field1                            
            ,fieldDate
                            from SHV
                            where fieldDate between '2004-01-01' and '2005-01-01'
                            ")


Data:

dput(SHV[1:50,c("field1","fieldDate")])
structure(list(field1 = c(1378L, 1653L, 1882L, 2400L, 
2305L, 2051L, 2051L, 2051L, 1796L, 2054L, 2568L, 1290L, 1804L, 
1804L, 3855L, 1297L, 2321L, 2321L, 2321L, 2071L, 2071L, 2074L, 
2588L, 1567L, 1317L, 1317L, 808L, 808L, 1321L, 2350L, 1586L, 
2613L, 1590L, 2614L, 2107L, 1340L, 1085L, 1085L, 2365L, 1344L, 
1601L, 1858L, 1603L, 1603L, 1860L, 2376L, 1355L, 1867L, 2382L, 
1872L), fieldDate = structure(c(12551, NA, NA, 14057, 15337, 
12919, 13336, 10325, 14984, 15643, 12864, 11242, 10749, 11207, 
10602, NA, 12646, 15649, NA, NA, NA, NA, NA, 17015, 13938, NA, 
16693, NA, NA, 12634, 12614, 10689, 12755, 10844, 11375, 4899, 
17298, 10905, 11450, NA, 10330, 15429, 12634, 10504, 12625, 11081, 
10939, NA, 12934, 11176), class = "Date")), .Names = c("field1", 
"fieldDate"), row.names = c(NA, 50L), class = "data.frame")
modLmakur
  • 531
  • 2
  • 8
  • 24
  • There are no records of that date range. – sm925 Dec 21 '17 at 21:54
  • 1
    SQLite has no date or time data types so it sends Date class variables as the R Date type's internal representation which is the number of days since the Epoch. If you use the RH2 back end (H2 does have a date type) instead of the default RSQLite backend then your SQL statement will work as is. `library(sqldf); library(RH2); sqldf(...your statement...)` – G. Grothendieck Dec 22 '17 at 00:45

3 Answers3

1

Per the sqldf() documentation, one needs to format the dates into their numeric values in order to have them processed as dates. This can be accomplished with sprintf() when generating the SQL query.

SHV <- structure(list(field1 = c(1378L, 1653L, 1882L, 2400L, 
                          2305L, 2051L, 2051L, 2051L, 1796L, 2054L, 2568L, 1290L, 1804L, 
                          1804L, 3855L, 1297L, 2321L, 2321L, 2321L, 2071L, 2071L, 2074L, 
                          2588L, 1567L, 1317L, 1317L, 808L, 808L, 1321L, 2350L, 1586L, 
                          2613L, 1590L, 2614L, 2107L, 1340L, 1085L, 1085L, 2365L, 1344L, 
                          1601L, 1858L, 1603L, 1603L, 1860L, 2376L, 1355L, 1867L, 2382L, 
                          1872L), fieldDate = structure(c(12551, NA, NA, 14057, 15337, 
                                                              12919, 13336, 10325, 14984, 15643, 12864, 11242, 10749, 11207, 
                                                              10602, NA, 12646, 15649, NA, NA, NA, NA, NA, 17015, 13938, NA, 
                                                          16693, NA, NA, 12634, 12614, 10689, 12755, 10844, 11375, 4899, 
                                                          17298, 10905, 11450, NA, 10330, 15429, 12634, 10504, 12625, 11081, 
                                                          10939, NA, 12934, 11176), class = "Date")), .Names = c("field1", 
                                                                                                                 "fieldDate"), row.names = c(NA, 50L), class = "data.frame")

library(sqldf)
sqlStmt <- paste("select field1, fieldDate from SHV",
                 "where fieldDate between ",
                 sprintf("%d and %d",as.Date('2004-01-01','%Y-%m-%d'),
                     as.Date('2005-01-01','%Y-%m-%d')))
datedf<-sqldf(sqlStmt)
datedf

> datedf
  field1  fieldDate
1   1378 2004-05-13
2   2321 2004-08-16
3   2350 2004-08-04
4   1586 2004-07-15
5   1590 2004-12-03
6   1603 2004-08-04
7   1860 2004-07-26
> 

The sprintf() statement converts the dates to a numeric value, which ensures that the between operator in SQL works correctly.

> sqlStmt
[1] "select field1, fieldDate from SHV where fieldDate between  12418 and 12784"
>
Len Greski
  • 10,505
  • 2
  • 22
  • 33
  • 1
    This could be simplified to: `fromDate <- as.Date('2004-01-01'); toDate <- as.Date('2005-01-01'); fn$sqldf("select field1, fieldDate from SHV where fieldDate between $fromDate and $toDate")` – G. Grothendieck Dec 21 '17 at 23:02
0

In this sample of data you have no records in that date range:

SHV[SHV$fieldDate >= "2010-01-01" & SHV$fieldDate < "2011-01-01",]
  field1 fieldDate
NA        NA      <NA>
NA.1      NA      <NA>
NA.2      NA      <NA>
NA.3      NA      <NA>
NA.4      NA      <NA>
NA.5      NA      <NA>
NA.6      NA      <NA>
NA.7      NA      <NA>
NA.8      NA      <NA>
NA.9      NA      <NA>
NA.10     NA      <NA>
NA.11     NA      <NA>
NA.12     NA      <NA>
AidanGawronski
  • 2,055
  • 1
  • 14
  • 24
  • Agreed -there's a gap between 2008-06-27 and 2011-01-01. But there's a problem with the syntax. Extending the range from 2000 to 2011 still returns zero rows. – wibeasley Dec 21 '17 at 21:56
  • @wibeasley Thank you, yes I didn't check the sample data to make sure there were records in the date range. I've updated the original post so there are now records in the date range in the sample date. Do you see what the issue is with my code? – modLmakur Dec 21 '17 at 22:01
-1

According to this article, the date field should be converted to a character before executing sqldf.

Before passing any dates to SQLdf we need to first convert them to strings. Otherwise, SQLdf will try to treat them as numbers–which will cause a lot of heartache.

...

Instead, we should convert the DateCreated column to a string instead of a date. Then, SQL will actually convert it from a string to a date.

Confused? Imagine me when I was trying to figure this out on my own.

So your code could be:

SHV$fieldDate <- as.character(SHV$fieldDate)

datedf <- sqldf("
  SELECT
    field1,
    fieldDate
  FROM SHV
  WHERE fieldDate between '2004-01-01' and '2005-01-01'
  --WHERE '2004-01-01' <= fieldDate --and fieldDate <= '2005-01-01'
  ORDER BY fieldDate
")

# Both should equal 7.  Verify that null rows are handled as desired.
nrow(datedf)
sum(as.Date('2004-01-01') <= SHV$fieldDate & SHV$fieldDate <= as.Date('2005-01-01'), na.rm=T)

I wish it had some more explanation about when it casts a date-looking variable into an actual date. If you're looking around more, this SO response by @g-grothendieck takes a different approach and equate the data types inside the sqldf query.

wibeasley
  • 5,000
  • 3
  • 34
  • 62
  • Converting the numeric date values to character is risky because it does not work correctly for all date formats, such as '%d-%m-%Y' where the character comparison '12-01-2004' > '01-02-2004' returns TRUE even though February 1, 2004 is greater than January 12, 2004. – Len Greski Dec 21 '17 at 22:30
  • As far as I can tell, `as.character(SHV$fieldDate)` will always return [ISO 8601](https://xkcd.com/1179/). There's no need for anything like '%d-%m-%Y' in this code because it's starting as a Date type. – wibeasley Dec 21 '17 at 22:40
  • I agree that in this specific example there is no need for anything like '%d-%m-%Y', it was meant to illustrate a situation where a character comparison between two dates returns the wrong answer. I have seen coders generate defects with this type of comparison in multiple programming languages. – Len Greski Dec 21 '17 at 22:59