0

When I run this Loop I can print the results and I want to create a data frame with this data but I cant. Until now I have this:

filenames <- list.files(path=getwd())  
numfiles <- length(filenames)  
for (i in 1:numfiles) {
  file <- read.table(filenames[i],header = TRUE)
  ts = subset(file, file$name == "plantNutrientUptake")
  tss = subset (ts, ts$path == "//plants/nitrate")
  tssc = tss[,2:3]    
  d40 = tssc[41,2]
  print(d40)
  print(filenames[i]) 
} 
Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
Ivan Lopez
  • 13
  • 2
  • 2
    Hi Ivan, welcome to SO! The problem is that for every iteration of the loop, you're overwriting what you did previously. At the end of the loop, ts, tss, tssc and d40 will only contain content related to your last file in filenames. What is your goal? Can you share a [minimal, reproducible example of your code](https://stackoverflow.com/a/5963610/8485403)? – csgroen Sep 24 '18 at 16:56
  • Thank you! My goal is create a data.frame with 2 columns and 256 rows, in each row the value for filenames[i] and d40 and repeat that for the 256 files. Until now when I run the code above it print the 256 values of filenames[i] and d40... but I dont know how create a data frame with that.... I am begginer in programing – Ivan Lopez Sep 24 '18 at 17:05

2 Answers2

0

This is not the most efficient way to do this, but it takes advantage of what code you've already written. First, you'll create an empty data frame with the columns you want, but filled with NA. Then, in each iteration of the loop, you'll fill one row of the data frame.

filenames <- list.files(path=getwd())  
numfiles <- length(filenames)

# Create an empty data.frame
df <- data.frame(filename = rep(NA, numfiles), d40 = rep(NA, numfiles))

  for (i in 1:numfiles){
    file <- read.table(filenames[i],header = TRUE)
    ts = subset(file, file$name == "plantNutrientUptake")
    tss = subset (ts, ts$path == "//plants/nitrate")
    tssc = tss[,2:3]    
    d40 = tssc[41,2]

    # Fill row i of the data frame
    df[i,"filename"] = filenames[i]
    df[i,"d40"] = d40

}

Hope that does it! Good luck :)

csgroen
  • 2,511
  • 11
  • 28
0

There are a lot of ways to do what you are asking. Also, without a reproducible example it is difficult to validate that code will run. I couldn't tell what type of data was in each of your variable so I just guessed that they were mostly characters with one numeric. You'll need to change the code if that's not true.

The following method is using base R (no other packages). It builds off of what you have done. There are other ways to do this using map, do.call, or apply. But it's important to be able to run through a loop.

As someone commented, your code is just re-writing itself every loop. Luckily you have the variable i that you can use to specify where things go.

filenames <- list.files(path=getwd())  
numfiles <- length(filenames)  

# Declare an empty dataframe for efficiency purposes
df <- data.frame(
  ts = rep(NA_character_,numfiles),
  tss = rep(NA_character_,numfiles),
  tssc = rep(NA_character_,numfiles),
  d40 = rep(NA_real_,numfiles),
  stringsAsFactors = FALSE
)

# Loop through the files and fill in the data
for (i in 1:numfiles){
  file <- read.table(filenames[i],header = TRUE)
  df$ts[i] <- subset(file, file$name == "plantNutrientUptake")
  df$tss[i] <- subset (ts, ts$path == "//plants/nitrate")
  df$tssc[i] <- tss[,2:3]    
  df$d40[i] <- tssc[41,2]
  print(d40)
  print(filenames[i]) 
} 

You'll notice a few things about this code that are extra.

First, I'm declaring the variable type for each column explicitly. You can use rep(NA,numfiles) but that leave R to guess what the column should be. This may not be a problem for you if all of your variables are obviously of the same type. But imagine you have a variable a = c("1","A","B") of all characters. R will go through the first iteration of the loop and guess that the column is numeric. Then on the second run of the loop will crash when it runs into a character.

Next, I'm declaring the entire dataframe before entering the loop. When people tell you that loops in [modern] R are slow it is often because you are re-allocating memory every loop. By declaring the entire dataframe up front you speed up the loop significantly. This also allows you to reference any cell in the dataframe...which is exactly what you want to do in the loop.

Finally, I'm using the $ syntax to make things clear. Writing df[i,"d40"] <- d40 is the same as writing df$d40[i] <- d40. I just think it is clear to use the second method. This is a matter of personal preference.

Adam Sampson
  • 1,971
  • 1
  • 7
  • 15