2

If I have a vector of year and month coded like this:

ym <- c(
  201401,
  201403:201412,
  201501:201502,
  201505:201510,
  201403
)

And I'd like to end up with a vector that looks like this:

 [1]  1  1  2  3  4  5  6  7  8  9 10  11  12  1  2  3  4  5  6  1

That is, I want to count continuous sequences of month records. Can anyone recommend an approach? I've spinning my wheels with something like this:

ym_date <- as.Date(paste0(ym, 01), format = "%Y%m%d")

diff(ym_date)

but haven't been able to get any farther because I'm not sure how to flag that start of a sequence when we are dealing with months. Any base R, tidyverse, data.frame centric or not solution would be welcomed.

boshek
  • 4,100
  • 1
  • 31
  • 55

2 Answers2

1

We can use

library(lubridate)
mth <- month(ym_date)
new <- mth + cumsum(c(0, (mth %/% 12)[-length(mth)])) * 12
ave(mth, cumsum(c(TRUE, diff(new) != 1)), FUN = seq_along)
#[1]  1  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  1

It can be also written in a more compact way

ave(mth, cumsum(c(TRUE, diff(c(0, head(cumsum(mth == 12), -1)) * 12 + mth) != 1)), FUN = seq_along)
#[1]  1  1  2  3  4  5  6  7  8  9 10 11 12  1  2  3  4  5  6  1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Oops - I screwed up my expected output. I need the count to span the end of the year and into the next. So the output _should_ be `1 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 1` – boshek Feb 14 '20 at 22:57
1

Maybe you can try the following base R code with rle

r <- unlist(sapply(rle(cumsum(c(1,round(as.numeric(diff(ym_date))/30.24)!=1)))$lengths,seq_along))

or with ave

r <- ave(ym,cumsum(c(1,round(as.numeric(diff(ym_date))/30.24)!=1)),FUN = seq_along)

such that

> r
 [1]  1  1  2  3  4  5  6  7  8  9 10  11  12  1  2  3  4  5  6  1
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81