176

I'm very new to R and am working on updating an R script to iterate through a series of .dbf tables created using ArcGIS and produce a series of graphs.

I have a directory, C:\Scratch, that will contain all of my .dbf files. However, when ArcGIS creates these tables, it also includes a .dbf.xml file. I want to remove these .dbf.xml files from my file list and thus my iteration. I've tried searching and experimenting with regular expressions to no avail. This is the basic expression I'm using (Excluding all of the various experimentation):

files <- list.files(pattern = "dbf")

Can anyone give me some direction?

Tyler
  • 9,872
  • 2
  • 33
  • 57
chawkins
  • 1,863
  • 2
  • 12
  • 4
  • 2
    If you're struggling with regexps but know the wildcard-pattern, function `glob2rx()` is often helpful. – caracal Feb 02 '11 at 16:37
  • Is it just me or is the title misleading: should read "with only a particular extension" (but I cannot find an answer on SO to excluding certain extensions either) – J. Win. Feb 02 '11 at 17:03
  • caracal, thanks for the suggestion. jonw, I suppose I could have worded it more succinctly, I was just trying to get it posted before a meeting. – chawkins Feb 02 '11 at 17:49
  • it caught my attention because as I learn about regexp, I have been wondering if there is an easy way to exclude. maybe deserves a separate question. – J. Win. Feb 02 '11 at 21:11

6 Answers6

264
files <- list.files(pattern = "\\.dbf$")

$ at the end means that this is end of string. "dbf$" will work too, but adding \\. (. is special character in regular expressions so you need to escape it) ensure that you match only files with extension .dbf (in case you have e.g. .adbf files).

Marek
  • 49,472
  • 15
  • 99
  • 121
  • 1
    Is that case sensitive? – nsn Oct 20 '15 at 16:14
  • 10
    @nsn Yes, but if you want otherwise then there is `ignore.case` argument of the function, so `list.files(pattern = "\\.dbf$", ignore.case=TRUE)`. And take look on the help page for that function (`?list.files`) to more details. – Marek Oct 20 '15 at 18:22
71

Try this which uses globs rather than regular expressions so it will only pick out the file names that end in .dbf

filenames <- Sys.glob("*.dbf")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
14

Peg the pattern to find "\\.dbf" at the end of the string using the $ character:

list.files(pattern = "\\.dbf$")
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
11

Gives you the list of files with full path:

  Sys.glob(file.path(file_dir, "*.dbf")) ## file_dir = file containing directory
Surya
  • 11,002
  • 4
  • 57
  • 39
9

I am not very good in using sophisticated regular expressions, so I'd do such task in the following way:

files <- list.files()
dbf.files <- files[-grep(".xml", files, fixed=T)]

First line just lists all files from working dir. Second one drops everything containing ".xml" (grep returns indices of such strings in 'files' vector; subsetting with negative indices removes corresponding entries from vector). "fixed" argument for grep function is just my whim, as I usually want it to peform crude pattern matching without Perl-style fancy regexprs, which may cause surprise for me.

I'm aware that such solution simply reflects drawbacks in my education, but for a novice it may be useful =) at least it's easy.

donshikin
  • 1,423
  • 1
  • 8
  • 6
  • 1
    You should remove `-` sign before `grep`. I needed this kind of a solution to extract specific files from a zip file. First, get file list in a data.frame and get specific files and extract them later. `lf <- unzip(file, list=T)[,1]; files.shp <- lf[grep(".shp", lf, fixed=T)]` – Sezen Jul 17 '15 at 16:12
1

Another option is the fs::dir_ls function. It allows to search with either a wildcard pattern (such as "*.dbf") or with a regex pattern such as "dbf$".

fs::dir_ls(dir, recurse = FALSE, glob = "*.dbf")
fs::dir_ls(dir, recurse = FALSE, regex = "dbf$")
dipetkov
  • 3,380
  • 1
  • 11
  • 19