0

I have two columns name as month_return1 and month_return2: I want to find if month_return1 is greater than month_return2 or not, and if yes since how many rows is it greater. Somewhat like a streak maybe which tells a number (Ex. 3, which means month_return1 > month_return2 since last 3 values.) .

Here is the data I am working on :

enter image description here

Example explanation - For the first five rows the streak will go as 1,2,3,4,5 and at 5th row the number would be 5. On row 6 the counter resets as month_return2 is not greater than month_return1 so it's gonna show 0 or NA till it again finds month_return1 > month_return2.

Hope I'm able to explain properly. Thanks.

  • 2
    You want a cumulative sum of `month_return1 > month_return2` with reset: https://stackoverflow.com/questions/32994060/r-cumulative-sum-by-condition-with-reset – jblood94 May 10 '22 at 11:09

1 Answers1

0

This should do it:

library(dplyr)
library(lubridate)
set.seed(519)
dat <- data.frame(
  date = seq(ymd("2013-01-01"), ymd("2014-12-01"), by="month"), 
  month_return1 = rnorm(24,1,1), 
  month_return2 = rnorm(24, 0,1)
)

dat <- dat %>% 
  mutate(gt = month_return1 > month_return2, 
         spell = as.numeric(gt != lag(gt)), 
         spell = case_when(row_number() == 1 ~ 1, 
                           TRUE ~ spell), 
         spell = cumsum(spell)) %>% 
  group_by(spell) %>% 
  mutate(streak = row_number()) %>% 
  select(date, month_return1, month_return2, streak)
#> Adding missing grouping variables: `spell`
head(dat, n=15)
#> # A tibble: 15 × 5
#> # Groups:   spell [9]
#>    spell date       month_return1 month_return2 streak
#>    <dbl> <date>             <dbl>         <dbl>  <int>
#>  1     1 2013-01-01         0.674         0.439      1
#>  2     1 2013-02-01         2.20         -1.34       2
#>  3     2 2013-03-01         1.09          1.53       1
#>  4     2 2013-04-01        -0.878         0.916      2
#>  5     3 2013-05-01         2.57         -1.36       1
#>  6     4 2013-06-01         0.104         0.272      1
#>  7     5 2013-07-01         1.54         -0.996      1
#>  8     5 2013-08-01         2.10          0.133      2
#>  9     6 2013-09-01        -0.677        -0.376      1
#> 10     7 2013-10-01         0.414        -0.266      1
#> 11     7 2013-11-01         2.22          0.853      2
#> 12     7 2013-12-01         0.698         0.220      3
#> 13     8 2014-01-01        -0.513         1.01       1
#> 14     8 2014-02-01        -1.07          0.276      2
#> 15     9 2014-03-01         1.07          0.530      1

Created on 2022-05-10 by the reprex package (v2.0.1)

DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25
  • I have a tibble to work on, seems it doesn't works without collect() function. Throws an error. – Arihant Jain May 10 '22 at 12:31
  • If you can post an example of data and code that throw the error, we can work from there. Otherwise, if it works with the `collect()` function, then it sounds like you problem is solved. – DaveArmstrong May 10 '22 at 14:16