0

I have a problem that I want to ask for your advice.

I am trying to compute "the duration of the previous month's Value in consecutive months" only when there are at least 3 months' of observations in the past.

An example data looks like:

structure(list(Group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2), Month = c(2, 
4, 5, 6, 7, 10, 11, 12, 13, 14, 5), Value = c(0.1, 0.1, 0.1, 
0.1, 0.2, 0.1, 0.1, 0.1, 0.2, 0.2, 0.3)), class = "data.frame", row.names = c(NA, 
-11L), codepage = 65001L)

To clarify, the table below ("Stability" column) is what I am seeking. I added Note for the rationale of Stability:

╔═══════╦═══════╦═══════╦═══════════╦════════════════════════════════════════════════════════════╗
║ Group ║ Month ║ Value ║ Stability ║ Note                                                       ║
╠═══════╬═══════╬═══════╬═══════════╬════════════════════════════════════════════════════════════╣
║   1   ║   2   ║  0.1  ║     na    ║ Because there is no previous month's Value                 ║
╠═══════╬═══════╬═══════╬═══════════╬════════════════════════════════════════════════════════════╣
║   1   ║   4   ║  0.1  ║     na    ║ Because there is no previous month's Value                 ║
╠═══════╬═══════╬═══════╬═══════════╬════════════════════════════════════════════════════════════╣
║   1   ║   5   ║  0.1  ║     na    ║ We cannot know what happened for Month 3                   ║
║       ║       ║       ║           ║ (Or there is no data for 3 consecutive months in the past) ║
╠═══════╬═══════╬═══════╬═══════════╬════════════════════════════════════════════════════════════╣
║   1   ║   6   ║  0.1  ║     na    ║ There is no 3 consecutive months' of data in the past      ║
╠═══════╬═══════╬═══════╬═══════════╬════════════════════════════════════════════════════════════╣
║   1   ║   7   ║  0.2  ║     3     ║ 0.1 (Month 6's Value) was stable for 3 months in the past, ║
║       ║       ║       ║           ║ from Month 4 to 6                                          ║
╠═══════╬═══════╬═══════╬═══════════╬════════════════════════════════════════════════════════════╣
║   1   ║   10  ║  0.1  ║     na    ║ Because there is no previous month's Value                 ║
╠═══════╬═══════╬═══════╬═══════════╬════════════════════════════════════════════════════════════╣
║   1   ║   11  ║  0.1  ║     na    ║ There is no 3 consecutive months' of data in the past      ║
╠═══════╬═══════╬═══════╬═══════════╬════════════════════════════════════════════════════════════╣
║   1   ║   12  ║  0.1  ║     na    ║ There is no 3 consecutive months' of data in the past      ║
╠═══════╬═══════╬═══════╬═══════════╬════════════════════════════════════════════════════════════╣
║   1   ║   13  ║  0.2  ║     3     ║ 0.1 (Month 12's Value) was stable for 3 months in the past ║
╠═══════╬═══════╬═══════╬═══════════╬════════════════════════════════════════════════════════════╣
║   1   ║   14  ║  0.2  ║     1     ║ 0.2 (Month 13's Value) was stable for 1 month (Month 13)   ║
╠═══════╬═══════╬═══════╬═══════════╬════════════════════════════════════════════════════════════╣
║   2   ║   5   ║  0.3  ║     na    ║ Because there is no previous month's Value for Group 2...  ║
╚═══════╩═══════╩═══════╩═══════════╩════════════════════════════════════════════════════════════╝
user14250906
  • 197
  • 8
  • @alex_jwb90 Do you think your previous code (https://stackoverflow.com/questions/63820413/how-to-code-for-the-duration-of-consecutive-monthly-stability-in-r/63820706#63820706) could work by adding condition tests? – user14250906 Sep 24 '20 at 17:48

1 Answers1

1

Noticed too late that the Group actually matters so it's not complete.

setDT(test)
test[, StabilityValues:= NA]
test[, ConsMonths := NA]
test[, StabilityValues := shift(unlist(lapply(rle(test$Value)[["lengths"]], seq)), type = "lag")]
test[, ConsMonths := shift(unlist(lapply(rle(diffinv(diff(test$Month)!=1))$lengths, seq)), type= "lag")]
test[, Stability := ifelse(StabilityValues < ConsMonths , StabilityValues, ConsMonths)]
test[, M1 := shift(Month, n = 1)]
test[, M2 := shift(Month, n = 2)]
test[, M3 := shift(Month, n = 3)]
test[!((Month - M1 == 1) & (M1 - M2 == 1) & (M2 - M3 == 1)), Stability := NA]
test[, .(Group, Month, Value, Stability )]

EDIT:
Added support for the Group. please note your test case is not good as it would not illustrate a problem in the grouping.

setDT(test)
test[, StabilityValues:= NA]
test[, ConsMonths := NA]
test[, StabilityValues := unlist(lapply(test[, rle(Value), by = Group]$length, seq))]
test[, ConsMonths := unlist(lapply(test[, rle(diffinv(diff(Month) != 1)), by = Group]$lengths, seq))]
test[, StabilityValues := shift(StabilityValues, type = "lag"), by = Group]
test[, ConsMonths := shift(ConsMonths, type = "lag"), by = Group]

test[, Stability := ifelse(StabilityValues < ConsMonths , StabilityValues, ConsMonths)]
test[, M1 := shift(Month, n = 1)]
test[, M2 := shift(Month, n = 2)]
test[, M3 := shift(Month, n = 3)]
test[!((Month - M1 == 1) & (M1 - M2 == 1) & (M2 - M3 == 1)), Stability := NA]
test[, .(Group, Month, Value, Stability )]
Will
  • 910
  • 7
  • 17
  • Thank you @Will. Would it work by including group_by() in front? – user14250906 Sep 27 '20 at 04:51
  • @user14250906 I've added grouping. i would recommend having a more complete test case for validation purpose. – Will Sep 27 '20 at 10:39
  • Thank you so much @Will. I tried adding more data and it seems to work! For the test[, M1 := shift(Month, n = 1)] part, it is not necessary to include group, right? I looked at the output for each line, and it seems okay! – user14250906 Oct 01 '20 at 14:50
  • actually that's why test cases are so important. it might if the data is not ordered same as your example. so one test case would be to have rows mixings groups and months. i would just avoid the problem by doing this in the beginning: test[order(Group, Month)] – Will Oct 01 '20 at 20:48
  • Your code worked perfectly. You are the hero, Will! One thing I'm trying to add is to consider the values that are different within the margin of 0.001 to be the "same". I wanted to ask for your advice if this is something that should be included in the fourth line of the code (i.e., test[, StabilityValues :=....). I thought I can round the Values first, and then rerun your code. – user14250906 Nov 11 '20 at 19:21
  • you can try to play with this rle(round(Value, 3)) adapt your test case by adding 0.0001 to some values, and change the 3 for other digits to confirm it's working. good luck. – Will Nov 11 '20 at 21:47