There are a lot of ways to do what you are asking. Also, without a reproducible example it is difficult to validate that code will run. I couldn't tell what type of data was in each of your variable so I just guessed that they were mostly characters with one numeric. You'll need to change the code if that's not true.
The following method is using base R (no other packages). It builds off of what you have done. There are other ways to do this using map
, do.call
, or apply
. But it's important to be able to run through a loop.
As someone commented, your code is just re-writing itself every loop. Luckily you have the variable i
that you can use to specify where things go.
filenames <- list.files(path=getwd())
numfiles <- length(filenames)
# Declare an empty dataframe for efficiency purposes
df <- data.frame(
ts = rep(NA_character_,numfiles),
tss = rep(NA_character_,numfiles),
tssc = rep(NA_character_,numfiles),
d40 = rep(NA_real_,numfiles),
stringsAsFactors = FALSE
)
# Loop through the files and fill in the data
for (i in 1:numfiles){
file <- read.table(filenames[i],header = TRUE)
df$ts[i] <- subset(file, file$name == "plantNutrientUptake")
df$tss[i] <- subset (ts, ts$path == "//plants/nitrate")
df$tssc[i] <- tss[,2:3]
df$d40[i] <- tssc[41,2]
print(d40)
print(filenames[i])
}
You'll notice a few things about this code that are extra.
First, I'm declaring the variable type for each column explicitly. You can use rep(NA,numfiles)
but that leave R to guess what the column should be. This may not be a problem for you if all of your variables are obviously of the same type. But imagine you have a variable a = c("1","A","B")
of all characters. R will go through the first iteration of the loop and guess that the column is numeric. Then on the second run of the loop will crash when it runs into a character.
Next, I'm declaring the entire dataframe before entering the loop. When people tell you that loops in [modern] R are slow it is often because you are re-allocating memory every loop. By declaring the entire dataframe up front you speed up the loop significantly. This also allows you to reference any cell in the dataframe...which is exactly what you want to do in the loop.
Finally, I'm using the $
syntax to make things clear. Writing df[i,"d40"] <- d40
is the same as writing df$d40[i] <- d40
. I just think it is clear to use the second method. This is a matter of personal preference.