1

Possible Duplicate:
When importing CSV into R how to generate column with name of the CSV?

I got a collection of files that I want to load into a single data frame. Each of the files I'm reading has the same structure, but different number of rows. Let's say that each of those files represent a single participant, and I know I can read them using the code below:

files <- c("john.csv","fred.csv","nick.csv","alex.csv")
library(plyr)
dfoc <- ldply(files, read.csv, header = T)

Now, I want to be able to identify which rows belong to which participant. I want to add a single column for each of those files before I read them into a big data frame. The column I want to add will have a number of rows equal to number of rows for specific participant nrow(john). The column I add should simply contain an identifier, for example file name repeated nrow(x) times.

Any suggestions?

Community
  • 1
  • 1
Geek On Acid
  • 6,330
  • 4
  • 44
  • 64
  • @joran indeed you are right, I haven't found it and it's quite similar. Although there is no solution like the one pointed by Josh below... – Geek On Acid Nov 21 '12 at 22:23
  • Actually, the solutions at the post I linked to are nearly identical to the one Josh posted below. – joran Nov 21 '12 at 22:29

1 Answers1

3

Here is what I'd do. (The key idea is to place the value of the id column and the just-read-in data.frame together inside a call to data.frame(). R's recycling rules will make the id column have the right length in each case.)

## Set up a reproducible example
a <- tempfile()
b <- tempfile()
write.csv(head(mtcars, 2), file=a)
write.csv(tail(mtcars, 3), file=b)
fnames <- c(a,b)

## Here's the code you are looking for
do.call(rbind, lapply(fnames, function(X) {
    data.frame(id = basename(X), read.csv(X))})
)
#                 id             X  mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1 file104862dd45aa     Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# 2 file104862dd45aa Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# 3  file1048d9e5764  Ferrari Dino 19.7   6  145 175 3.62 2.770 15.50  0  1    5    6
# 4  file1048d9e5764 Maserati Bora 15.0   8  301 335 3.54 3.570 14.60  0  1    5    8
# 5  file1048d9e5764    Volvo 142E 21.4   4  121 109 4.11 2.780 18.60  1  1    4    2
zcarrico
  • 15
  • 5
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • I think get your solution, but for some reason it gives me `Error in match.names(clabs, names(xi)): names do not match previous names` when I `do.call`. I think I need to revise the way I read my filenames, and it should work... – Geek On Acid Nov 21 '12 at 22:24
  • That sounds suspiciously like not all of the files being read in have the same column names. That's at least what I'd investigate first... Cheers. – Josh O'Brien Nov 21 '12 at 22:26