R function loops twice?

Question

I wrote a loop which inputs several text files, performs a few functions on each and combines them. I've copied it below and annotated each line. However, the first file in i gets read in (and added to my final table) twice! Also, looking forward to streamline this loop.

source_files<-list.files(pattern="_output.txt") # This line finds all file ending with .txt

source_files from above lists the appropriate files to input in the below loop.

for (i in source_files){
    if (!exists("final_table")){
        df_import<-read.table(i, header=FALSE, sep="\t") # reads in each file
        names<-unlist(strsplit(i,"_")) # reformats input file name and parses to 'names'
        df_import$Sample<-names[1] # replaces col[1] header with first part of file name
        df_import$DB<-names[2] # replaces col[1] header with first part of file name
        final_table<-df_import # creates the final table data frame
        rm(df_import) # remove excess df
        }
    if (exists("final_table")){
        df_import<-read.table(i, header=FALSE, sep="\t") # reads in each file
        names<-unlist(strsplit(i,"_")) # reformats input file name and parses to 'names'
        df_import$Sample<-names[1] # replaces col[1] header with first part of file name
        df_import$DB<-names[2] # replaces col[1] header with first part of file name
        final_table <-rbind(final_table, df_import) # Adds to existing final table
        rm(df_import)   
    }
}

This loop is working great, except that final_table has a duplication - any suggestions?

You could also initialize `final_table` outside of the `for` loop, removing the need for `if/else` completely. — Mako212, Jan 07 '19 at 20:02

score 2 · Accepted Answer · answered Jan 07 '19 at 20:03

Well, you test if the table exists in the first if, and if it doesn't it creates it and adds a row to it. So when you get to the second if, the table does exist but it adds the row again. Rather than using two if statements, use one if/else. Also maybe just move the final_table <-... lines into the if and move the other lines out so you don't have so much repeated code.

Maybe

for (i in source_files){
    df_import<-read.table(i, header=FALSE, sep="\t") # reads in each file
    names<-unlist(strsplit(i,"_")) # reformats input file name and parses to 'names'
    df_import$Sample<-names[1] # replaces col[1] header with first part of file name
    df_import$DB<-names[2] # replaces col[1] header with first part of file name
    if (!exists("final_table")){
        final_table<-df_import # creates the final table data frame
    } else {
        final_table <-rbind(final_table, df_import) # Adds to existing final table
    }
    rm(df_import) # remove excess df
}

Those there are better ways to do this rather than looping and rbinding each time. See this answer: What's wrong with my function to load multiple .csv files into single dataframe in R using rbind?

That is more streamlined and fixes the issue. Thanks! – shu251 Jan 07 '19 at 20:41 — shu251, Jan 07 '19 at 20:41

score 1 · Answer 2 · answered Jan 07 '19 at 20:04

I would take a slightly different approach. It appears the only difference in your if() block is what you do with final_table. I would probably do something along these lines:

#This mimics your list.files() call
list_of_files <- list(mtcars, mtcars, mtcars)

#put the guts of your code inside a function
process_file <- function(file) {
  #your stuff goes here - I'm just going to add a random variable named foo      
  file$foo <- rnorm(nrow(file))
  return(file)
}
#use lapply to iterate over your list of files and do.call to bind them together
output <- do.call("rbind", lapply(list_of_files, process_file))

^{Created on 2019-01-07 by the reprex package (v0.2.1)}

this is also a good use to lapply() and can be used for other types of for loops, etc. that require processing several data frames. Thanks! — shu251, Jan 07 '19 at 20:42

R function loops twice?

2 Answers2