16

I have filenames named <InputData>.<TestName>.csv and I'd like to make graphs for each test. The best way I can see to do this is to make one R table for each TestName. Each test produces the same columns of data, so I'd like to pull in all the data for each test into an R datatable with an extra column for the inputdata.

I'd like to do:

read.tables(c("B217.SE.csv", "C10.SE.csv"), sep=",")

produces (for example):

       Filename  col1   col2
1   B217.SE.csv     1      2
2   B217.SE.csv     2      4
3   C10.SE.csv      3      1
4   C10.SE.csv      4      5

What's the right way to do this? Some existing function I don't know about? Writing it out in the R language using a for loop?

Marek
  • 49,472
  • 15
  • 99
  • 121
Thelema
  • 14,257
  • 6
  • 27
  • 35

2 Answers2

12

I can't test it on your data, but you will want to use an apply type function like this:

data <- do.call("rbind", lapply(c("file1", "file2"), function(fn) 
           data.frame(Filename=fn, read.csv(fn)
))

Or, you can simplify it by using plyr. Here's a rough simulation of how that would work (using data frames instead of files):

> df1 <- data.frame(c1=1:5, c2=rnorm(5))
> df2 <- data.frame(c1=3:7, c2=rnorm(5))

In this case I will use get instead of read.csv:

> data <- ldply(c("df1", "df2"), function(dn) data.frame(Filename=dn, get(dn)))
> data
  Filename c1          c2
1  df1  1 -0.15679732
2  df1  2 -0.19392102
3  df1  3  0.01369413
4  df1  4 -0.73942829
5  df1  5 -1.27522427
6  df2  3 -0.33944114
7  df2  4 -0.12509065
8  df2  5  0.11225053
9  df2  6  0.88460684
10 df2  7 -0.70710520

Edit

Taking Marek's suggestion, you can either overwrite or create your own function:

read.tables <- function(file.names, ...) {
    require(plyr)
    ldply(file.names, function(fn) data.frame(Filename=fn, read.csv(fn, ...)))
}

data <- read.tables(c("filename1.csv", "filename2.csv"))
Shane
  • 98,550
  • 35
  • 224
  • 217
  • In the way to total generalization: `read.tables <- function(files, ...) ldply(files, function(f) data.frame(Filename=f, read.csv(f,...)))` (then we can pass arguments to `read.csv`) – Marek Jan 20 '10 at 20:56
  • 2
    I normally do something like `names(file.names) <- basename(file.names); ldply(file.names, read.csv)` - then you don't need to add the file name column yourself. – hadley Jan 21 '10 at 00:54
12

Try this:

## take files.
files <- list.files(pattern=".csv")
## read data using loop
DF <- NULL
for (f in files) {
   dat <- read.csv(f, header=T, sep="\t", na.strings="", colClasses="character")
   DF <- rbind(DF, dat)
}
larus
  • 4,359
  • 4
  • 21
  • 13
  • In order to track the data by file name, I am keen on getting the name out of the `f` variable. I used `tools` package to get file name using `file_path_sans_ext`. The only caveat is, it outputs with vector col/row id (dimension). How can I just get just the name? – bonCodigo Apr 09 '17 at 07:52