I have multiple files in a single folder and I'd like to perform the same action each file using R, without just changing the file name in my script every time. In particular, I have 26 .csv files that I need to identify duplicate entries in individually. Any advice on the best way to do this is appreciated.
Asked
Active
Viewed 772 times
2 Answers
5
I would use list.files
within an lapply
.
For example, I think something like this is a good start:
res <- lapply(list.files(path=FILES_DIRECTORY,
pattern='*.csv', ## I look for csv files,
full.names=T), ## to get full names path+filename
function(file){ ff <- read.csv(file)
ff[duplicated(ff),]
})
You can also name the resulted list with file names.
names(res) <- gsub('[.]csv','',
list.files(path=FILES_DIRECTORY,pattern='*.csv'))

agstudy
- 119,832
- 17
- 199
- 261
-
I believe you could avoid having to name the list in a separate command if you use `sapply` with `USE.NAMES=TRUE` instead of `lapply`. – Matthew Plourde May 29 '13 at 14:33
-
@MatthewPlourde yes you can use `sapply` here even you don't get a pretty name( you will get full path names). Personally, I rarely use `sapply` and I prefer using `lapply`. – agstudy May 29 '13 at 14:50
-
1Wonderful, I can't wait to try it. Thank you both for your assistance. – KES May 29 '13 at 20:27
0
Perhaps you should make this treatment done by the calling terminal with something like :
R --save --args *.csv < myScript.R
See How can I read command line parameters from an R script?

Community
- 1
- 1