0

I have multiple files in a single folder and I'd like to perform the same action each file using R, without just changing the file name in my script every time. In particular, I have 26 .csv files that I need to identify duplicate entries in individually. Any advice on the best way to do this is appreciated.

KES
  • 101
  • 3
  • 9

2 Answers2

5

I would use list.files within an lapply. For example, I think something like this is a good start:

 res <- lapply(list.files(path=FILES_DIRECTORY,
                              pattern='*.csv',  ## I look for csv files, 
                              full.names=T),    ## to get full names path+filename
                   function(file){ ff <- read.csv(file)
                                   ff[duplicated(ff),]
                   })

You can also name the resulted list with file names.

  names(res) <- gsub('[.]csv','',
                      list.files(path=FILES_DIRECTORY,pattern='*.csv'))
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • I believe you could avoid having to name the list in a separate command if you use `sapply` with `USE.NAMES=TRUE` instead of `lapply`. – Matthew Plourde May 29 '13 at 14:33
  • @MatthewPlourde yes you can use `sapply` here even you don't get a pretty name( you will get full path names). Personally, I rarely use `sapply` and I prefer using `lapply`. – agstudy May 29 '13 at 14:50
  • 1
    Wonderful, I can't wait to try it. Thank you both for your assistance. – KES May 29 '13 at 20:27
0

Perhaps you should make this treatment done by the calling terminal with something like :

R --save --args *.csv < myScript.R

See How can I read command line parameters from an R script?

Community
  • 1
  • 1