6

I've started working with RSQLite and dplyr to efficiently process large datasets. However, I haven't been able to reconcile how to get RSQLite to format dates or what best practices are here. The example below should illustrate where the process falls apart for me:

library(tidyverse)
library(RSQLite)

Data

Date is formatted appropriately

date=seq(as.Date("1910/1/1"), as.Date("1911/1/1"), "days")
x=rnorm(length(date))
df1 <- tibble(date, x)
df1

# A tibble: 366 × 2
date           x
<date>       <dbl>
  1  1910-01-01  1.72459562
2  1910-01-02  0.88216253
3  1910-01-03 -0.35434587
4  1910-01-04 -0.63401467
5  1910-01-05  0.18136909
6  1910-01-06 -0.09513488
7  1910-01-07 -1.03252313
8  1910-01-08  0.40924962
9  1910-01-09  0.90759866
10 1910-01-10  0.60456596
# ... with 356 more rows

Create the database

dbname = "test.sqlite3"
con <- dbConnect(SQLite(), dbname)

Add df1 to the database

dbWriteTable(con, "test", df1, append=TRUE)

let's see what has been created

dbListTables(con)
dbListFields(con, "test")

Connect to the database

test_db <- src_sqlite(path=dbname)

What happened to the date?

We lose the formatting which is problematic for subsequent processing.

tbl(test_db, "test") 

Source:   query [?? x 2]
Database: sqlite 3.11.1 [test.sqlite3]
date           x
<dbl>       <dbl>
  1  -21915 -0.05640646
2  -21914 -0.05640646
3  -21913 -0.05640646
4  -21912 -0.05640646
5  -21911 -0.05640646
6  -21910 -0.05640646
7  -21909 -0.05640646
8  -21908 -0.05640646
9  -21907 -0.05640646
10 -21906 -0.05640646

Can anyone recommend strategies for dealing with dates when using RSQLite?

boshek
  • 4,100
  • 1
  • 31
  • 55
  • Check out this question and the included link about SQLite dates (or lack thereof): http://stackoverflow.com/questions/13462086/using-dates-with-rsqlite – blongworth Jan 11 '17 at 20:43
  • Funny - that helps a lot. Basically you need to reduce functionality to add functionality. – boshek Jan 11 '17 at 21:00
  • 2
    SQLite does not have a date type although it has date functions that can interpret certain number or strings. There are some examples here: https://github.com/ggrothendieck/sqldf The H2 database might be a better fit if you want to do such processing since it does have a date type. – G. Grothendieck Jan 11 '17 at 21:46
  • Well I think might stick with sqlite simply because I like the way `dplyr` functions with it. This situation really only rears it head because of `read_csv` (not used here) and when it reads in data, it will automatically detect if a column should be in date format. Easy fix though. – boshek Jan 11 '17 at 21:52

1 Answers1

5

There's no date or time datatype in SQLite. Two possible approaches are to save them as text or as an integer. I suspect they're getting stored as days before the unix epoch (1970-01-01) by default.

If you want to work with the dates as integers, you can select by date:

d <- as.numeric(as.Date("1910-01-04"))
filter(test, date < d)

Or convert back to a tbl with dates as dates:

collect(test) %>% mutate(date = as.Date(date, '1970-01-01'))

You could store the dates as text:

df1$ts <- as.character(df1$date)
dbWriteTable(con, "test", df1, overwrite=TRUE)
tbl(test_db, "test") 

This is subsettable in the way you'd hope:

`filter(test, date < "1910-01-04")`

You could also convert to a normal tbl and convert the text to dates:

collect(test) %>% mutate(date = as.Date(date))
blongworth
  • 337
  • 2
  • 10
  • 1
    You could use this trick for date-times as well. The POSIXct -> numeric and -> POSIXct conversions are both in this: `as.POSIXct(as.numeric(Sys.time()),origin="1970-01-01",tz="UTC")` – Dave X Sep 22 '17 at 01:35