1

Running into a real head-scratcher and not sure of how to resolve. Really hoping some of you may be able to help. Also, first time I've ever contributed to StackOverflow....yay!

library(tidyverse)
library(lubridate)

start_date <- ymd("2014-06-28")
end_date <- ymd("2019-06-30")
PayPeriod_EndDate <- seq(start_date, end_date, by = '2 week')
PayPeriod_Interval <- int_diff(PayPeriod_EndDate)

This creates a vector of intervals, with each interval representing a pay period of two weeks in length. This is part one, and part one is relatively easy (though still took awhile to figure out, ha).

Part two contains a vector of dates.

Dates <- c("2014-07-08", "2018-10-20", "2018-12-13", "2018-12-13", "2018-12-06", "2018-11-30", "2019-01-16", "2019-01-23", "2019-03-15", "2018-10-02")

I want to identify Dates %within% Intervals, with the output being the interval that each date is within. So Date "2014-07-08" will be assigned 2014-06-28 UTC--2014-07-12 UTC, since this dates is within this interval.

A very similar problem seems to have been explored here...https://github.com/tidyverse/lubridate/issues/658

I have attempted the following

ymd(Dates) %within% PayPeriod_Interval

However, the result only calculates for the first element in the Dates vector. I have since tried various combinations of for loops, mutating into factors, etc... with little progress. This is work related so am really on a time-deficit and will be monitoring this post throughout the day and into the weekend.

Best and thank you! James

James Crumpler
  • 192
  • 1
  • 8
  • Btw, the dataset contains 7.3 million rows of "Dates" and the vector of intervals contains 130 intervals. Attempted map_chr briefly, but ran into a same length problem and abandoned the attempt. – James Crumpler Nov 01 '19 at 12:13
  • Congratz on your first post! This is unclear "takes calculates". Also, could you clarify what your expected output should look like? – s_baldur Nov 01 '19 at 12:26
  • Welcome to SO! You can edit your question, if you have additional information! – kath Nov 01 '19 at 12:27
  • Will edit of course, and the "takes calculates" statement is definitely unclear. Thank you for point it out! – James Crumpler Nov 01 '19 at 12:31

2 Answers2

3

The tidyverse is very useful but sometimes, base R is all you need. In this case the cut function is all you need.

library(lubridate)

start_date <- ymd("2014-06-28")
end_date <- ymd("2019-06-30")
PayPeriod_EndDate <- seq(start_date, end_date, by = '2 week')

Dates <- c("2014-07-08", "2018-10-20", "2018-12-13", "2018-12-13", "2018-12-06", "2018-11-30", "2019-01-16", "2019-01-23", "2019-03-15", "2018-10-02")


startperiod<-cut(as.Date(Dates), breaks=PayPeriod_EndDate)
endperiod<-as.Date(startperiod)+13

The output from the cut function is the start date of each pay period which the "Dates" variable is located.

Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • This is extremely close to the solution!....I want the end date of the interval rather than the start date of the interval. I wonder if a str_extract would work, using the pattern result of the cut function, there is probably an easier solution. I feel like this is the right process, and just need to tweak the code slightly... – James Crumpler Nov 01 '19 at 12:59
  • You can convert the output back to a date object and add 13 days to it for the last date in the period. See edit above. – Dave2e Nov 01 '19 at 12:59
  • Perfect, Thank you So much Dave2e! – James Crumpler Nov 01 '19 at 13:09
  • 1
    A brief update - the proposed solution worked brilliantly! Ended up completing my part of the project exactly as the work day was drawing to a close. It would not have happened without your help, it saved a lot of time. Thank you! – James Crumpler Nov 02 '19 at 12:25
1

This is how a map - solution could look like:

map(ymd(Dates), ~ PayPeriod_Interval[.x %within% PayPeriod_Interval])
# [[1]]
# [1] 2014-06-28 UTC--2014-07-12 UTC
# 
# [[2]]
# [1] 2018-10-13 UTC--2018-10-27 UTC
# 
# ...

To have the result as a interval vector (and not list) you can use:

PayPeriod_Interval[map_int(ymd(Dates), ~ which(.x %within% PayPeriod_Interval))]

# [1] 2014-06-28 UTC--2014-07-12 UTC 2018-10-13 UTC--2018-10-27 UTC 2018-12-08 UTC--2018-12-22 UTC 2018-12-08 UTC--2018-12-22 UTC 2018-11-24 UTC--2018-12-08 UTC
# [6] 2018-11-24 UTC--2018-12-08 UTC 2019-01-05 UTC--2019-01-19 UTC 2019-01-19 UTC--2019-02-02 UTC 2019-03-02 UTC--2019-03-16 UTC 2018-09-29 UTC--2018-10-13 UTC

If you are just interested in the end date of the interval an option is

PayPeriod_EndDate[map_int(ymd(Dates), ~ which.min(.x > PayPeriod_EndDate))]
# [1] "2014-07-12" "2018-10-27" "2018-12-22" "2018-12-22" "2018-12-08" "2018-12-08" "2019-01-19" "2019-02-02" "2019-03-16" "2018-10-13"

which.min returns number of the entry of the first Date of PayPeriod_EndDate that is not smaller than the specific date in the Dates-vector, thus the Date which is at the end of the specific payment period.

kath
  • 7,624
  • 17
  • 32
  • I have not been able to replicate yet without throwing an error message – James Crumpler Nov 01 '19 at 13:03
  • I guess Dave2e solved the issue for you - if you have trouble or need clarification, state the error message. It is always good to try solutions in a fresh R session with the same data you provided to us - this way you can see whether it is the solution which is causing the trouble or some issues in your original data / package environment etc... – kath Nov 01 '19 at 13:05
  • I have not been able to replicate yet without throwing an error message. I think Dave2e's response resolves the problem I was experiencing. Thank you for putting the answer into map and map_int, as I prefer using map when possible, not sure why I was experiencing an error yet, but will work on it some more. Thank you !! – James Crumpler Nov 01 '19 at 13:21