Suppose we have the test file generated in the Note at the end. We suppose that the actual file is much larger or else we could have just read it directly using read.csv
. Instead we use read.csv.sql
to only read in 2017 and 2018 into R. Then we use this much smaller data frame cutting it down to precisely the dates we want:
library(sqldf)
sql <- "select * from file where dates like '%2017' or dates like '%2018'"
dd <- read.csv.sql("testfile.csv", sql)
dd$dates <- as.Date(dd$dates, "%d/%m/%Y")
dd_sub <- subset(dd, dates > '2017-01-13' & dates <= '2018-01-13')
To show that it worked we display the first few and last few rows:
> head(dd_sub)
dates value
14 2017-01-14 744
15 2017-01-15 745
16 2017-01-16 746
17 2017-01-17 747
18 2017-01-18 748
19 2017-01-19 749
> tail(dd_sub)
dates value
365 2017-12-31 1095
366 2018-01-01 1096
367 2018-01-02 1097
368 2018-01-03 1098
369 2018-01-04 1099
370 2018-01-05 1100
The actual file you have may have different format than testfile.csv
so you will need to carefully read ?read.csv.sql
and set whatever arguments you need.
Note
# test input in reproducible form
dates <- format(as.Date("2015-01-01") + 1:1100, "%d/%m/%Y")
d <- data.frame(dates, value = seq_along(dates))
write.csv(d, "testfile.csv", row.names = FALSE, quote = FALSE)