2

I have the following data:

library(tidyverse)
library(lubridate)
df <- tibble(date = as_date(c("2019-11-20", "2019-11-27", "2020-04-01", "2020-04-15", "2020-09-23", "2020-11-25", "2021-03-03")))

# A tibble: 7 x 1
  date      
  <date>    
1 2019-11-20
2 2019-11-27
3 2020-04-01
4 2020-04-15
5 2020-09-23
6 2020-11-25
7 2021-03-03

I also have an ordered comparison vector of dates:

comparison <- seq(as_date("2019-12-01"), today(), by = "months") - 1

I now want to compare my dates in df to those comparison dates and so something like:

  • if date in df is < comparison[1], then assign a 1
  • if date in df is < comparison[2], then assign a 2
  • and so on.

I know I could do it with a case_when, e.g.

df %>%
  mutate(new_var = case_when(date < comparison[1] ~ 1,
                             date < comparison[2] ~ 2))

(of course filling this up with all comparisons).

However, this would require to manually write out all sequential conditions and I'm wondering if I couldn't just automate it. I though about creating a match lookup first (i.e. take the comparison vector, then add the respective new_var number (i.e. 1, 2, and so on)) and then match it against my data, but I only know how to do that for exact matches and don't know how I can add the "smaller than" condition.

Expected result:

# A tibble: 7 x 2
  date       new_var
  <date>       <dbl>
1 2019-11-20       1
2 2019-11-27       1
3 2020-04-01       6
4 2020-04-15       6
5 2020-09-23      11
6 2020-11-25      13
7 2021-03-03      17
deschen
  • 10,012
  • 3
  • 27
  • 50
  • 4
    Give `findInterval` a try. – ekoam Jan 19 '22 at 09:44
  • 1
    Related: [Categorize numeric variable into group/ bins/ breaks](https://stackoverflow.com/questions/12979456/categorize-numeric-variable-into-group-bins-breaks); [Categorical date variable in R](https://stackoverflow.com/questions/34451344/categorical-date-variable-in-r) – Henrik Jan 19 '22 at 09:46
  • 1
    `findInterval(df$date,comparison) + 1` gives you the expected result. – Maël Jan 19 '22 at 09:48
  • Wow, didn't think that it could be so easy. Thanks for your help. I'm happy to upvote and accept as an answer if you'd like to reply. – deschen Jan 19 '22 at 10:17

1 Answers1

1

You can use findInterval as follows:

df %>% mutate(new_var = df$date %>%  findInterval(comparison) + 1)
# A tibble: 7 x 2
  date       new_var
  <date>       <dbl>
1 2019-11-20       1
2 2019-11-27       1
3 2020-04-01       6
4 2020-04-15       6
5 2020-09-23      11
6 2020-11-25      13
7 2021-03-03      17
Erfan Ghasemi
  • 337
  • 2
  • 6
  • Although this solution is correct (thanks for it!), it is exactly what was suggested in the comments, so I'd like to wait a bit until one of the commenters replied, since I would find it more fair to accept their answers. – deschen Jan 19 '22 at 11:09