How to utilise filename information as a factor in R

Question

I have a problem very similar to this one, but slightly more involved.

I have a heap of csv files that each record 50 observations for a particular replicate of a particular variety.

The files are named Genotype_Rep.csv, and I have been able to figure out how to extract "Genotype" and "Rep" from the filename, thanks to the solution I mentioned above.

However because each csv file has 50 records, I need to add the Genotype to every line, which doesn't work in the above solution.

Example:

#Assume that the names of the files in the wd has been assigned to 'filenames'.
#Here's a dummy version:

filenames <- c("A_1.csv", "A_2.csv", "B_1.csv", "B_2.csv")

# extract ID from filename
ids <- gsub("([A-Z])_[0-9].csv", "\\1", filenames)

import <- mdply(filenames, read.csv)
import$ID <- IDs[import$Var1]
import$Var1 <- NULL

This works really nicely when each file has one observation, but not when I need to add it to several lines. I've no doubt it's very simple, but if someone could help me out, that would be great.

Can you clarify a little? Is "lines" == "rows" ? What exactly is the "it" in "...I need to add it to several lines..." ? — Carl Witthoft, Feb 12 '12 at 22:17

score 1 · Accepted Answer · answered Feb 12 '12 at 22:42

1

when I test mdply() for reading multiple data.frames from files, the column that contains the file indices is "X1", not "Var1". So try replacing

import$ID <- IDs[import$Var1]
import$Var1 <- NULL

with

import$ID <- ids[import$X1]
import$X1 <- NULL

(I also figured you meant to use "ids", not "IDs".)

answered Feb 12 '12 at 22:42

flodel

87,577
21
185
223

Yes, you're correct, however it doesn't solve my underlying problem... import is still a list as long as the number of files. I need a way to get that file information added to every row in the new dataframe – alexwhan Feb 13 '12 at 01:07
Sorry flodel, I just started working on this again, and realised you're completely correct. Thanks so much! – alexwhan Feb 13 '12 at 02:48

How to utilise filename information as a factor in R

1 Answers1