4

I am new to R. I have multiple files in a directory on my local pc. I have imported them to R and added column names as below. Now I need to add the year to each data frame which corresponds to the file name. For example the first file is called 1950 the 2nd 1951 and so on. How do I add the year as a column name with these values in R?

The output is below
  Name Sex Number
 1    Linda   F     10
 2     Mary   F    100
 3  Patrick   M    200
 4  Barbara   F    300
 5    Susan   F    500
 6  Richard   M    900
 7  Deborah   F    500
 8   Sandra   F     23
 9    Conor   M     15
 10   Conor   F    120

I need another column at the start that is the year for this file?

This is my code to generate the above.

ldf <- list() # creates a list
listtxt <- dir(pattern = "*.txt") # creates the list of all the txt files in the directory
#Year = 1950
for (k in 1:length(listtxt)) #1:4  4 is the length of the list 
{
  ldf[[k]] <- read.table(listtxt[k],header=F,sep=",")
  colnames(ldf[[k]]) = c('Name', 'Sex', 'Number')
  #test = cbind(ldf[[k]], Year )

}

I need the year to increase by 1 for each file and to add it as a column with the value? Any help would be greatly appreciated.

oldtimetrad
  • 145
  • 2
  • 13

2 Answers2

4

You can add a column with the year by getting the year directly from the file name. I've also used lapply instead of a loop to cycle through each of the files.

In the code below, the function reads a single file and also adds a column with the year of that file. Since your file names have the year in the name, you just get the year from the file name using substr. lapply applies the function to every file name in listtxt, resulting in a list where each element is a data frame. Then you just rbind all of the list elements into a single data frame.

ldf = lapply(listtxt, function(x) {

      dat = read.table(x, header=FALSE, sep=",")

      # Add column names
      names(dat) = c('Name', 'Sex', 'Number')

      # Add a column with the year
      dat$Year = substr(x,1,4)

      return(dat)
})

# Combine all the individual data frames into a single data frame
df = do.call("rbind", ldf)

Instead of do.call("rbind", ldf) you can also use rbind_all from the dplyr package, as follows:

library(dplyr)
df = rbind_all(ldf)
eipi10
  • 91,525
  • 24
  • 209
  • 285
1

I couldn't add as a comment to @eipi10 answer above, so I'll have to do it here. I just tried this and it worked perfectly (thanks - I'd search for hours with no luck) but got message that rbind_all is deprecated. the dplyr solution is now:

library(dplyr)
df = bind_rows(ldf)
Emilio M. Bruna
  • 297
  • 1
  • 3
  • 14