4

I wish to use R to read multiple csv files from a single folder. If I wanted to read every csv file I could use:

list.files(folder, pattern="*.csv")

See, for example, these questions:

Reading multiple csv files from a folder into a single dataframe in R

Importing multiple .csv files into R

However, I only wish to read one of four subsets of the files at a time. Below is an example grouping of four files each for three models.

JS.N_Nov6_2017_model220_N200.csv
JS.N_Nov6_2017_model221_N200.csv
JS.N_Nov6_2017_model222_N200.csv
my.IDs.alt_Nov6_2017_model220_N200.csv
my.IDs.alt_Nov6_2017_model221_N200.csv
my.IDs.alt_Nov6_2017_model222_N200.csv
parms_Nov6_2017_model220_N200.csv
parms_Nov6_2017_model221_N200.csv
parms_Nov6_2017_model222_N200.csv
supN_Nov6_2017_model220_N200.csv
supN_Nov6_2017_model221_N200.csv
supN_Nov6_2017_model222_N200.csv

If I only wish to read, for example, the parms files I try the following, which does not work:

list.files(folder, pattern="parm*.csv")

I am assuming that I may need to use regex to read a given group of the four groups present, but I do not know.

How can I read each of the four groups separately?

EDIT

I am unsure whether I would have been able to obtain the solution from answers to this question:

Listing all files matching a full-path pattern in R

I may have had to spend a fair bit of time brushing up on regex to apply those answers to my problem. The answer provided below by Mako212 is outstanding.

Mark Miller
  • 12,483
  • 23
  • 78
  • 132
  • 1
    Possible duplicate of [Listing all files matching a full-path pattern in R](https://stackoverflow.com/questions/10353540/listing-all-files-matching-a-full-path-pattern-in-r); in particular, I think the first answer to that question will solve your issue -- it looks like you need to escape the period (right now your pattern is "parm", then one occurrence of any character, then "csv") – duckmayr Nov 08 '17 at 21:13
  • Get all the filenames, *list.files* returns them in alpha ordered, then use *split*, and read in chunks using *lapply* or *forloop*, e.g.: `myFiles <- 1:12; split(myFiles, ceiling(seq_along(myFiles)/3))` – zx8754 Nov 08 '17 at 21:18
  • 4
    For "parm": `list.files(folder, pattern="^parm.*?\\.csv")` – Mako212 Nov 08 '17 at 21:27

2 Answers2

11

A quick REGEX 101 explanation:

For the case of matching the beginning and end of the string, which is all you need to do here, the following prinicples apply to match files that are .csv and start with parm:

list.files(folder, pattern="^parm.*?\\.csv")

^ asserts we're at the beginning of the string, so ^parm means match parm, but only if it's at the beginning of the string.

.*? means match anything up until the next part of the pattern matches. In this case, match until we see a period \\.

. means match any character in REGEX, so we need to escape it with \\ to match the literal . (note that in R you need the double escape \\, in other languages a single escape \ is sufficienct).

Finally csv means match csv after the .. If we were going to be really thorough, we might use \\.csv$ using the $ to indicate the end of the string. You'd need the dollar sign if you had other files with an extension like .csv2. \\.csv would match .csv2, where as \\.csv$ would not.

In your case, you could simply replace parm in the REGEX pattern with JS, my, or supN to select one of your other file types.

Finally, if you wanted to match a subset of your total file list, you could use the | logical "or" operator:

list.files(folder, pattern = "^(parm|JS|supN).*?\\.csv")

Which would return all the file names except the ones that start with my

Mako212
  • 6,787
  • 1
  • 18
  • 37
3

The list.files statement shown in the question is using globs but list.files accepts regular expressions, not globs.

Sys.glob To use globs use Sys.glob like this:

olddir <- setwd(folder)
parm <- lapply(Sys.glob("parm*.csv"), read.csv)

parm is now a list of data frames read in from those files.

glob2rx Note that the glob2rx function can be used to convert globs to regular expressions:

parm <- lapply(list.files(folder, pattern = glob2rx("parm*.csv")), read.csv)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Note that any parameters of the read.csv() function can be passed as additional arguments within the lapply() function. E.g., use the following if you have pipe-separated file: `lapply(list.files(folder, pattern = glob2rx("parm*.csv")), read.csv, sep='|')` – Vishal Jul 17 '19 at 14:58