3

Let's assume I've got a three years time series, like below:

library(lubridate)
ts1 <- seq(ymd('2016-01-01'), ymd('2018-12-31'), '1 day')

Now I want to specify some time in the year, for example astronomical summer for northern hemisphere which starts on 21st of June and ends 23rd of September and check which elements of my ts1 vector falls into this range. How can I do that, with lubridate at best, but not neccesarily?

jakes
  • 1,964
  • 3
  • 18
  • 50
  • If you define the dates in a vector and then search in that list, will it not work: Example: `timerange <- seq(ymd('2016-06-21'), ymd('2016-09-23'), '1 day')` and filter `ts1[ts1 %in% timerange]` – Sonny Apr 12 '19 at 11:23
  • But that will find only summer in first year, not all three years. – jakes Apr 12 '19 at 11:26
  • Ya, the idea is to build a vector and use that for filtering. You could also run filters using `month`, `date`, `week`, etc functions – Sonny Apr 12 '19 at 11:27

4 Answers4

2

I would create a new date variable putting all dates in the same year and then check:

library(lubridate)
library(dplyr)
ts1 <- seq(ymd('2016-01-01'), ymd('2018-12-31'), '1 day')
df <- data_frame(odate = ts1)
df %>% mutate(temp_date = ymd(format(odate, "2000-%m-%d"))) %>%
    mutate(in_summer = temp_date %in% 
                        seq(ymd('2000-06-21'), ymd('2000-09-23'), '1 day')) %>%
    select(-temp_date)
## # A tibble: 1,096 x 2
##    odate      in_summer
##    <date>     <lgl>    
##  1 2016-01-01 FALSE    
##  2 2016-01-02 FALSE    
##  3 2016-01-03 FALSE    
##  4 2016-01-04 FALSE    
##  5 2016-01-05 FALSE    
##  6 2016-01-06 FALSE    
##  7 2016-01-07 FALSE    
##  8 2016-01-08 FALSE    
##  9 2016-01-09 FALSE    
## 10 2016-01-10 FALSE    
## # ... with 1,086 more rows

ymd(format(odate, "2000-%m-%d")) will put all dates into the year of 2000 (which is an arbitray choice).

amatsuo_net
  • 2,409
  • 11
  • 20
1

Create a character vector whose elements are of the form mmdd. Then ok is a logical vector indicating which elements of ts1 are within the desired ranges and the last line subsets ts1 to those ranges:

mmdd <- format(ts1, "%m%d") 
ok <- mmdd >= "0621" & mmdd <= "0923"
ts1[ok]
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

Here is a case_when approach. First of all we need to get a day of the year values for our period. The start of astronomical summer for northern hemisphere (21st of June) is 172rd day and end is 267th day (23rd of September). You can do it with lubridate::yday("2019-06-21").

Then we need do the same for our dataframe. So we got yours ts1. We need to transform it into data.frame or tibble and calculate yday:

library(lubridate)
library(dplyr)

ts1 <- seq(ymd('2016-01-01'), ymd('2018-12-31'), '1 day')

ts1 <- tibble(date = (ts1),
              day = yday(ts1))

Using sqldf

library(sqldf)

sqldf("select ts1.*, case when (ts1.day >= 172 and ts1.day <= 267)
      then 1 else 0 end as TOY
      from ts1", method = c("Date", "numeric", "logical")) %>%
  as_tibble()

# A tibble: 1,096 x 3
   date         day TOY  
   <date>     <dbl> <lgl>
 1 2016-01-01     1 FALSE
 2 2016-01-02     2 FALSE
 3 2016-01-03     3 FALSE
 4 2016-01-04     4 FALSE
 5 2016-01-05     5 FALSE
 6 2016-01-06     6 FALSE
 7 2016-01-07     7 FALSE
 8 2016-01-08     8 FALSE
 9 2016-01-09     9 FALSE
10 2016-01-10    10 FALSE
# ... with 1,086 more rows

Using dplyr

ts1 %>%
  mutate(TOY = case_when(day >= 172 & day <= 267 ~ "summer",
                         TRUE ~ "other"))

# A tibble: 1,096 x 3
   date         day TOY  
   <date>     <dbl> <chr>
 1 2016-01-01     1 other
 2 2016-01-02     2 other
 3 2016-01-03     3 other
 4 2016-01-04     4 other
 5 2016-01-05     5 other
 6 2016-01-06     6 other
 7 2016-01-07     7 other
 8 2016-01-08     8 other
 9 2016-01-09     9 other
10 2016-01-10    10 other
# ... with 1,086 more rows
atsyplenkov
  • 1,158
  • 13
  • 25
0

You can simply do this by using data.table package-

> library(data.table)
> library(lubridate)
> ts1 <- data.frame(date=seq(ymd('2016-01-01'), ymd('2018-12-31'), '1 day'))
> search_dt <- seq(as.Date("2000-06-21"), as.Date("2000-09-23"), by="days")
> setDT(ts1)[, ind:= ifelse(date %in% search_dt,TRUE,FALSE)]

Output-

> ts1
            date   ind
   1: 2016-01-01 FALSE
   2: 2016-01-02 FALSE
   3: 2016-01-03 FALSE
   4: 2016-01-04 FALSE
   5: 2016-01-05 FALSE
  ---                 
1092: 2018-12-27 FALSE
1093: 2018-12-28 FALSE
1094: 2018-12-29 FALSE
1095: 2018-12-30 FALSE
1096: 2018-12-31 FALSE
Rushabh Patel
  • 2,672
  • 13
  • 34