1

I have a data frame with dates in it. I want to delete all the rows which do not contain the years from (and including) 2014 - 2021. I've tried my luck with ifelse() and grep() but can't make anything work. Can anyone help me? Thanks!

Data frame:

# A tibble: 6 x 2
  `Legislative period`      Party                            
  <chr>                     <chr>                            
1 "2004/02/02 - 2019/08/13" "Conservative Party of Canada "  
2 "1993/03/11 - 2004/02/01" "Progressive Conservative Party "
3 "2014/07/09 - "           "Conservative Party of Canada "  
4 "2021/09/27 - "           "Independent Senators Group "    
5 "2021/07/29 - 2021/09/26" "Non-affiliated "                
6 "2013/01/25 - "           "Conservative Party of Canada "  

For reproducibility

structure(list(`Legislative period` = c("2004/02/02 - 2019/08/13", 
"1993/03/11 - 2004/02/01", "2010/07/09 - ", "2021/09/27 - ", 
"2021/07/29 - 2021/09/26", "2013/01/25 - "), Party = c("Conservative Party of Canada ", 
"Progressive Conservative Party ", "Conservative Party of Canada ", 
"Independent Senators Group ", "Non-affiliated ", "Conservative Party of Canada "
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
Quantizer
  • 275
  • 3
  • 13

2 Answers2

1

Following the next topic, please find an example where you subset the patients with specific string names in your column "Legislative period"
grep using a character vector with multiple patterns

#1- First, create your string of dates you want to focus on
toMatch <- paste(2014:2021,sep="")

#2- Then match the rows where the dates countain your string of dates
matches <- filter(yourDataFrame,grepl(paste(toMatch, collapse="|"), `Legislative period`))
Yacine Hajji
  • 1,124
  • 1
  • 3
  • 20
1

How about this?

library("dplyr")

data %>% 
  mutate(start = as.Date(substr(`Legislative period`, 1L, 10L)),
         end = as.Date(sub("^.{13}", "", `Legislative period`))) %>%
  filter(start < as.Date("2022-01-01"), 
         is.na(end) | end >= as.Date("2014-01-01")) %>%
  select(-c(start, end))

Here is the output:

# A tibble: 5 × 2
  `Legislative period`      Party                          
  <chr>                     <chr>                          
1 "2004/02/02 - 2019/08/13" "Conservative Party of Canada "
2 "2010/07/09 - "           "Conservative Party of Canada "
3 "2021/09/27 - "           "Independent Senators Group "  
4 "2021/07/29 - 2021/09/26" "Non-affiliated "              
5 "2013/01/25 - "           "Conservative Party of Canada "
Mikael Jagan
  • 9,012
  • 2
  • 17
  • 48
  • Unfortunately, it doesn't do the trick. It's supposed to delete the rows not containing 2014:2021. But thanks for trying! – Quantizer Nov 15 '21 at 11:20
  • Oops - sign error, fixed now. It's still not completely clear from your question whether you would want to keep or delete an interval like `"2004/02/02 - 2019/08/13"`, which does not contain any dates from 2020 or 2021. – Mikael Jagan Nov 15 '21 at 11:40