5

I need to write a function that will count the number of working days (minus weekends, and a vector of other local bank holidays), but the problem I'm coming up against is more simply illustrated with just counting the number of weekdays.

Here is a function that will give the number of weekdays between two dates:

removeWeekends <- function(end, start){

  range <- as.Date(start:end, "1970-01-01")

  range<- range[sapply(range, function(x){
                                if(!chron::is.weekend(x)){
                                  return(TRUE)
                                }else{
                                  return(FALSE)
                                }
                              })]

  return(NROW(range))

}

Which works when it is given a single date for each argument:

removeWeekends(as.Date("2018-05-08"), as.Date("2018-06-08"))
#[1] 24

But when it is given a two vectors from a data frame it fails:

one <- as.Date("2017-01-01"):as.Date("2017-01-08")
two <- as.Date("2018-06-08"):as.Date("2018-06-15")
df <- data.frame(one, two)
removeWeekends(df$two, df$one)
#[1] 375
#Warning messages:
#1: In start:end : numerical expression has 8 elements: only the first used
#2: In start:end : numerical expression has 8 elements: only the first used

I've also tried (which I guessed would not work as the syntax seems off):

lapply(df, removeWeekends, df$two, df$one)
#Error in FUN(X[[i]], ...) : unused argument (17167:17174)

And:

lapply(df[,c("two", "one")], removeWeekends)
#Error in as.Date(start:end, "1970-01-01") :   argument "start" is missing,
# with no default 

I'm assuming it is me misunderstanding the concept of vectorization.

The only other idea I've got is nesting the function within a conditional to see if it's a vector, then calling an apply function on it if it is although I'm not quite sure how I would structure that either.

Nick
  • 799
  • 1
  • 7
  • 18

2 Answers2

4

You have couple of options to support vectorized argument in function. Since, you have already written your function, the easiest option would be to use Vectorize and convert your function to support vectorized arguments. Another, option is to modify your function and re-write it to support vectorized arguments.

Option#1: Using Vectorize

# Function will support vectorized argument with single statement
vremoveWeekends  <- Vectorize(removeWeekends)

# Try vremoveWeekends  function 
df$dayswithoutweekends <- vremoveWeekends(df$two, df$one)

Option#2: Re-write function to support vectorized arguments. I'll prefer this option since, OP got two arguments which are expected to be of same length. Hence, it will be easier to perform error checking on arguments if we re-write it.

# Modified function 
removeWeekendsNew <- function(end, start){
  if(length(start) != length(end)){
    return(0L)  #Error condition
  }
  result <- rep(0L, length(start)) #store the result for each row

  #One can use mapply instead of for-loop. But for-loop will be faster
  for(i in seq_along(start)){     
    range      = seq(start[i], end[i], by="day")
    result[i]  = length(range[!chron::is.weekend(range)])
  }

  return(result)
}

#Use new function:
df$dayswithoutweekends <- removeWeekendsNew(df$two, df$one)

Result: It's same for both options mentioned above.

df
#          one        two dayswithoutweekends
# 1 2017-01-01 2018-06-08                 375
# 2 2017-01-02 2018-06-09                 375
# 3 2017-01-03 2018-06-10                 374
# 4 2017-01-04 2018-06-11                 374
# 5 2017-01-05 2018-06-12                 374
# 6 2017-01-06 2018-06-13                 374
# 7 2017-01-07 2018-06-14                 374
# 8 2017-01-08 2018-06-15                 375

Data:

one <- seq(as.Date("2017-01-01"),as.Date("2017-01-08"), by="day")
two <- seq(as.Date("2018-06-08"),as.Date("2018-06-15"), by="day")
df <- data.frame(one, two)
df
#          one        two
# 1 2017-01-01 2018-06-08
# 2 2017-01-02 2018-06-09
# 3 2017-01-03 2018-06-10
# 4 2017-01-04 2018-06-11
# 5 2017-01-05 2018-06-12
# 6 2017-01-06 2018-06-13
# 7 2017-01-07 2018-06-14
# 8 2017-01-08 2018-06-15
MKR
  • 19,739
  • 4
  • 23
  • 33
  • Thank you for your comment! It did occur to me just to write it the second way - but from what I (felt like I) understood about R and vectorisation, was that loops were to be avoided and were the 'wrong' way to do R. Is this only the case outside of functions, and is it more the overhead of calling the same function n times that is why loops are discouraged? – Nick Jun 11 '18 at 07:48
  • @Nick Its actually other way. `for-loop` is faster than `apply` series functions. But logic-wise `apply` series functions are cleaner and crisp. Hence, preferred. In your case, since `for-loop` is of 2 lines you can use it otherwise you can change it to apply. – MKR Jun 11 '18 at 08:41
  • @MKR ... Please relay the source that mentions `for` is faster than the *apply* family. From your fellow co-answerer below who summarizes in his own past question, the [iteration methods](https://stackoverflow.com/a/29006276/1422451), only `apply` is the same as `for`, the others are C loops calling R functions iteratively. – Parfait Jun 11 '18 at 19:53
  • @Parfait Thanks for asking. Just to avoid another long debate about performance of `apply` vs `for-loop` let me phrase it clearly what I wanted to convey. All I wanted to say was that `apply` family functions are not chosen for better speed rather those are chosen for crisp and neat logic of implementation. Performance, may vary case to case basis. We can find many benchmarking in SO on performance of `for-loop` vs `apply` One such is provided at https://stackoverflow.com/questions/5533246/why-is-apply-method-slower-than-a-for-loop-in-r – MKR Jun 11 '18 at 21:36
  • 2
    @MKR ... agreed on case by case basis. And do note: `apply` (despite its name) is fundamentally different than `lapply` and its wrappers: `s/v/m/tapply`. But we useRs should not make blanket statements that either loop types are discouraged. Cheers. Happy coding! – Parfait Jun 11 '18 at 21:42
3

If you want to fully vectorize this, you will need to think out of the box. What chron::is.weekend does is just checking how many days were Sundays and Saturdays in a certain time preiod. We can calculate this ourselves in a vectorized way because each week has two weekends, and the only tricky part are the left overs.

I wrote the following function to achieve this, though I'm sure it could be improved

frw <- function(two, one) {

  diff_d <- two - one ## difference in days
  l_d <- (two + 4L) %% 7L + 1L ## last day of the remainder 
  weeks <- diff_d %/% 7L ## number of weeks between
  days <- diff_d %% 7L ## days left

  ## calculate how many work days left
  diff_d - 
    ((weeks * 2L) + ((l_d - days < 1) + ((l_d - days < 2) - (l_d == 1L))) +
    (l_d %in% c(1L, 7L))) + 1L

}

You can run it as follows

frw(two, one)
## [1] 375 375 374 374 374 374 374 375

It is by far faster than the mapply version (almost instant), some benchmark on a bigger data:

one <- as.Date("2017-01-01"):as.Date("2030-01-08")
two <- as.Date("2017-05-01"):as.Date("2030-05-08")
df <- data.frame(one, two)

system.time(res_mapply <- vremoveWeekends(df$two, df$one)) # taken from the other answer
#  user  system elapsed 
# 76.46    0.06   77.25 

system.time(res_vectorized <- frw(df$two, df$one))
# user  system elapsed 
#    0       0       0

identical(res_mapply, res_vectorized)
# [1] TRUE
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • How would 'full vectorisation' be done in regards to checking other dates from an third vector to ensure the dates between `one` and `two` are not in there (and removing if they are). – Nick Jun 19 '18 at 15:12