0

I have a set of files with names like this:

01990205abc.dat
20200304abc.dat
20210506abc.csv

My goal is to match the last two strings, but not the first string. I used this pattern: ^(2020)|(2021)[0-9]{4}abc.(csv)|(dat)$ and this code:

files <- list.files(path = "mirror", pattern = "^(2020)|(2021)[0-9]{4}abc.(csv)|(dat)$")
print(files)

matches all three files, instead of just the last two. My expectation was that

  • (2020)|(2021) would match one of the years, but not 0199.
  • [0-9]{4} would match exactly four digits, then abc, followed by one or the other file extensions.
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Michael A
  • 4,391
  • 8
  • 34
  • 61

1 Answers1

3

You could use the pattern

^202[01][0-9]{4}abc\.(?:dat|csv)$

Regex demo

With the doubled escaped dot and capture groups:

^(202[01])[0-9]{4}abc\\.(dat|csv)$
  • ^(202[01]) Start of string and capture 2020 or 2021
  • [0-9]{4}abc Match 4 digits 0-9 and abc
  • \\. Match a dot
  • (dat|csv) Capture either dat or csv
  • $ End of string

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70