0

I'm trying to load multiple csv files into R and merge them all into one large data frame. The files I'm reading in are titled with the year that data was taken from, e.g. BirthWeight1999.csv, BirthWeight2000.csv, BirthWeight2001.csv. What I want to do is create a new column for each csv file to give the year the data originates from. For example, the file contains the columns, MotherWeight, Alcohol, BabyWeight and I want to include a new one called Year and it have the value from the files name. So if the file is BirthWeight1999, the Year column should contain 1999. I'm having real difficulty in finding out how to do this. I've got the files reading in...

filenames = list.files(dir())
do.call("rbind", lapply(filenames, read.csv, header = TRUE)

Any help is appreciated.

Thanks :)

CodeLearner
  • 389
  • 2
  • 6
  • 14
  • Could you show a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? In this case you should show the first few rows of your data (not just the code that creates it, which doesn't help us reproduce it because we don't have the files). Then please include the result of `dput(head(mydata))`, which will let us reproduce your data frame with copy and paste. – David Robinson Oct 14 '14 at 16:36
  • 1
    For future reference: you'll get better and faster answers if you title your question well: "Problems" could describe any question on the site, and your question never mentions anything about `subset` so it's not clear why you're asking about it in the title) – David Robinson Oct 14 '14 at 16:37

1 Answers1

0

If I understand correctly you can try:

    #examples of your filenames
    filenames<-paste(sep="","BirthWeight",1999:2014,".csv")
    #take the year
    years<-as.numeric(substring(filenames,12,15))
    #call read.csv to each filename and add the column year
    do.call(rbind,mapply(function(x,y) {ret<-read.csv(x);ret$year<-y;ret},filenames,years,SIMPLIFY=FALSE))
nicola
  • 24,005
  • 3
  • 35
  • 56