1

How do I match on a forward slash / in a regular expression in R?

As demonstrated in the example below, I am trying to search for .csv files in a subdirectory and my attempts to use a literal / are failing. Looking for a modification to my regex in base R, not a function that does this for me.

Example subdirectory

# Create subdirectory in current working directory with two .csv files
# - remember to delete these later or they'll stay in your current working directory!
dir.create(path = "example")
write.csv(data.frame(x1 = letters), file = "example/example1.csv")
write.csv(data.frame(x2 = 1:20), file = "example/example2.csv")

Get relative paths of all .csv files in the example subdirectory

# This works for the example, but could mistakenly return paths to other files based on:
# (a) file name: foo/example1.csv
# (b) subdirectory name: example_wrong/foo.csv
list.files(pattern = "example.*csv", recursive = TRUE)
#> [1] "example/example1.csv" "example/example2.csv"

# This fixes issue (a) but doesn't fix issue (b)
list.files(pattern = "^example.*?\\.csv$", recursive = TRUE)
#> [1] "example/example1.csv" "example/example2.csv"

# Adding / to the end of `example` guarantees we get the correct subdirectory

# Doesn't work: / is special regex and not escaped
list.files(pattern = "^example/.*?\\.csv$", recursive = TRUE)

# Doesn't work: escapes / but throws error
list.files(pattern = "^example\/.*?\\.csv$", recursive = TRUE)

# Doesn't work: even with the \\ escaping in R!
list.files(pattern = "^example\\/.*?\\.csv$", recursive = TRUE)

Some of the solutions above work with regex tools but not in R. I've checked SO for solutions (most related below) but none seem to apply:

Escaping a forward slash in a regular expression

Regex string does not start or end (or both) with forward slash

Reading multiple csv files from a folder with R using regex

socialscientist
  • 3,759
  • 5
  • 23
  • 58
  • 1
    I think you can do this with `dir_ls` from `{fs}` package. This function as well as other functions from this package really provides a bunch of useful options to work with paths in R. – shafee Jul 26 '22 at 08:40

1 Answers1

1

The pattern argument is only used for matching file (or directory) names, not the full path they are on (even when recursive and full.names are set to TRUE). That's why your last approach doesn't work even though it is the correct way to match / in a regular expression. You can get the correct file names by specifying path and setting full.names to TRUE.

list.files(path='example', pattern='\\.csv$', full.names=T)
Robert Hacken
  • 3,878
  • 1
  • 13
  • 15
  • Note: you can also just post-process with e.g. `gsub` and `basename()` as found here: https://stackoverflow.com/questions/36683359/remove-everything-in-string-up-to-last-forward-slash – socialscientist Jul 31 '22 at 09:44