0

I am working in R. I have a date sequence and I would like assign whether each particular date is term time or a school holiday. I plan to do this using a dataframe column where each row is labelled "Holiday" or "Term"

My approach is to create a time sequence and individually specify each holiday dates using a vector; the first element is the start date of the holiday and the second is the end of the holiday.

I then create an if-or statement to test whether the time lies within any of the holiday dates specified in the vectors.

Here is my code so far:

start <- as.POSIXlt("2015-10-10 00:00:00")
end <- as.POSIXlt("2016-03-31 00:00:00")

DateSeq <- seq(from=start, to=end, by="mins")

#Holidays defined using a vector with by start and end date

H1 <- c("2015-10-26", "2015-11-3") #October half term
H2 <- c("2015-12-16", "2016-01-05") #Christmas holiday
H3 <- c("2016-02-15", "2016-02-19") #Feb half term 
H4 <- c("2016-03-24", "2016-03-31") #Easter holiday 

date_table <- data.frame(Time = DateSeq)

if ((round(date_table$Time, units = "days")== H1[1] <> H1[2]) |         (round(date_table$Time, units = "days") == H2[1] <> H2[2])) {
  date_table$Holiday <- "Holiday"
 } else {
  date_table$Holiday <- "Term"
}

As you can see this code does not work and simply labels all the rows as "Term".

Therefore I am wondering the following:

  1. How I can specify the date range in the holiday vectors so it can be used within the if statement?

  2. Whether this is the best approach to use? I am quite new to R and I was thinking of alternatives such as creating a set of individual sequences for term and holidays and then stitching them together. This approach however seems fiddly but would appreciate your thoughts.

Thanks for your help.

Frank
  • 66,179
  • 8
  • 96
  • 180
alkey
  • 986
  • 4
  • 16
  • 33

2 Answers2

1

Similar to your approach by avoid if-else.

# Or alternatively by="mins"
DateSeq <- seq(from=start, to=end, by="hours")
date_table <- data.frame(Time = DateSeq)

# Put all holidays together
H<-rbind(H1,H2,H3,H4)

# Or alternatively set to "Term"
date_table$Holiday<-0

# Assign all the holidays
for (i in 1:nrow(H)){
  date_table[date_table$Time > H[i,1] &  date_table$Time < H[i,2],"Holiday"]<-1 # or "Holiday"
}

# Check they are correctly assign
plot(date_table)
fishtank
  • 3,718
  • 1
  • 14
  • 16
  • Thanks for your answer, it works really well however the last day of the holiday remains labelled as Term. I have tried replacing the < with <= and [i+1,2] and H[i,2]+1, but the I cannot get the final specified holiday date to be labelled as holiday. Do you have any suggestions why this could be happening? Or should I just specify one extra day in the holiday vector? Thanks again for your help :) – alkey Apr 05 '16 at 18:00
  • You can change the `date_table$Time < H[i,2]` to `date_table$Time < as.POSIXlt(H[i,2])+86400` which essentially adds an extra day or you can add the extra day into your holiday vector. – fishtank Apr 05 '16 at 21:44
  • Hi, sorry to bother you again but I was wondering if you knew a way of removing the row completely from the dataframe rather than adding a label? I tried using this solution [link] (http://stackoverflow.com/questions/6650510/remove-rows-from-data-frame-where-a-row-match-a-string) and used !data_table$Time but this did not work. Do you have a solution or know where I could find it? Thanks – alkey Apr 06 '16 at 15:58
  • Which row do you want to remove? Your question is unclear. There are several ways to index the row (http://www.r-tutor.com/r-introduction/data-frame/data-frame-row-slice) – fishtank Apr 06 '16 at 17:46
  • Hi thanks for your answer. I finally found the solution to delete the rows: This is what I was trying to achieve: `for (i in 1:nrow(H)){ date_table <- date_table[!(date_table$Time > H[i,1] & date_table$Time < as.POSIXlt(H[i,2])+86400),] }` – alkey Apr 07 '16 at 08:28
  • I just noticed another problem with this approach, there is a slight overlap and the first minute is labelled as Term when it was specified as Holiday. For example 26/10/2015 is specified as a Holiday, however, the first minute - 26/10/2015 00:00 is labelled as Term and then 26/10/2015 00:01 is listed as Holiday as expected. Do you have any idea why this is happening? – alkey Apr 07 '16 at 14:45
  • Change `date_table$Time > H[i,1]` to `date_table$Time >= H[i,1]` should take care of it. – fishtank Apr 07 '16 at 16:57
  • Oh thank you so much for your help. Turned out to be an easy change. I spent all day on it but could not see it. In the end I had to change the first term to Posixlt also and used `for (i in 1:nrow(H)){ date_table <- date_table[!(date_table$Time >= as.POSIXlt(H[i,1]) & date_table$Time < as.POSIXlt(H[i,2])+86400),] }` Thanks again :) – alkey Apr 08 '16 at 10:19
0

You could use cut() to cut DateSeq into term and holiday.

res <- cut(DateSeq, breaks = as.POSIXlt( c(H1, H2, H3, H4), format = "%Y-%m-%d"), labels = c("October", "term1", "Christmas", "term2", "Feb", "Term3", "Easter"))
table(res)
Richard Telford
  • 9,558
  • 6
  • 38
  • 51
  • Thanks for your answer, it seems this solution splits the date sequence into each term and then counts the minutes. How could I then enter this information into a dataframe? – alkey Apr 05 '16 at 17:30
  • the table is just to show that `cut` has worked. What do you want entering into a data.frame? – Richard Telford Apr 05 '16 at 18:24