-3

I have a set of files for each months and year stretching from 1881-2021. Names are in format month/year and look like:

  • 01_Jan/193501asc.gz
  • 09_Sep/188209asc.gz
  • 01_Jan/197501asc.gz
  • 07_Jul/202107asc.gz

I wonder how to write a regex expression to filter only the files that more then 1970? (period 1970-2021?) I have tried:

file_ls <- list.files(paste(myPath, "data", sep = "/"), 
                          pattern = "[>1970]",
                          #pattern = "[1970-2021]",
                          #pattern="*\\.gz$", # ending character
                          recursive=TRUE)

Expected files to return (years in period 1970-2021):

  • 01_Jan/197501asc.gz
  • 07_Jul/202107asc.gz
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
maycca
  • 3,848
  • 5
  • 36
  • 67

1 Answers1

1

I don't think a regex is the best way to go here. If you want to do some numerical filtering, I would just process the file_ls after you list all the *.gz. For example you could use:

s <- c("193501asc.gz", "188209asc.gz", "197501asc.gz", "202107asc.gz")

f <- function(x, y = 1970) {
  first4 <- substr(x, 1, 4)
  year <- as.numeric(first4)
  year >= y
}

s[f(s)]
#> [1] "197501asc.gz" "202107asc.gz"

Created on 2022-07-27 by the reprex package (v2.0.1)

Dan Adams
  • 4,971
  • 9
  • 28
  • Hi @DanAdams, I have changed my question and now wish to filter the years from 1970-2021. So the simple using '^20' (filtering only years >2000) does not work anymore. Maybe y have another suggestion how to adapt regex? Thank you! – maycca Jul 27 '22 at 18:27
  • could just change to `pattern = "^2"` right? – Dan Adams Jul 27 '22 at 18:53
  • to filter the years since 1970? no, this does not work unfortunately... – maycca Jul 27 '22 at 19:06
  • I misread. Ok let me see.. – Dan Adams Jul 27 '22 at 19:07
  • that's look great! I like the little function convertng the year back to numeric value. One last question: how to please operate **within** the individual folders? eg. if the `s <- c("jan_01/193501asc.gz", "feb_02/188209asc.gz", "mar_03/197501asc.gz", "apr_04/202107asc.gz")`? Just using `first4 <- substr(x, 8, 11)` ? maybe something more versatile? Thank you again! – maycca Jul 27 '22 at 20:12
  • 1
    That would probably work. If you can post a new question where you fully explain the situation with some example filenames, you will probably get a better answer. – Dan Adams Jul 27 '22 at 20:28