I have a folder holding a set of different data files. I would like to count the number of files that contain a given term, like "25" or "color coding", and if possible, listing the name of those files. Are they any ways to do that in R?
Asked
Active
Viewed 1,298 times
1
-
1Yes, I'm sure there are. [What have you tried?](http://whathaveyoutried.com) (Although, as a hint, if you're trying to search files for a given string, this kind of task is not well suited to R.) – Nov 12 '12 at 16:46
-
1The best way to do it depends quite a bit on what the format of the files is. – Drew Steen Nov 12 '12 at 16:47
-
Does [this](http://stackoverflow.com/questions/371115/count-all-occurrences-of-string-in-lots-of-files-with-grep) help? – liuminzhao Nov 12 '12 at 16:53
1 Answers
1
Does this do what you need
findTermsInFileNames <- function(terms, theFolder="path/to/Folder/", extension="R", ignoreCase=TRUE) {
# Iterates through all files of type `extension` in `theFolder` and returns a
# count for each time one of `terms` appears in a file name
# Note: extension should NOT include a dot. good: "*" bad: ".*"
# str_detect is from stringr
require(stringr)
# Get list of files
pat <- paste0("*.", extension)
filesList <- list.files(path.expand(theFolder), pattern=pat, ignore.case=ignoreCase)
# Add attribute to terms, whether cAseS should be ignored
attr(terms, "ignore.case") <- ignoreCase
# Tabulate all occurrences of temrs in the list of file names
results <- rowSums(sapply(filesList, str_detect, terms, USE.NAMES=TRUE))
# Clean up the table names
names(results) <- terms
return(results)
}
Example:
fold <- "~/git/src"
terms <- c("an", "example", "25")
findTermsInFileNames(terms, fold)

Ricardo Saporta
- 54,400
- 17
- 144
- 178