0

I have 50 text files all beginning with NEW. I want to loop through each textfile/dataframe and run the same functions for each of these files and then output the results via the write.table function. Therefore, for each 50 files, the same functions are applied and then 50 independent output results should be created containing the original name with word 'output' at the end.

Here is my code. I have used the list.file function to read in the 50 files but how can I adapt my code below so that each of the below R commands/functions run for each of the 50 files independently and then output the corresponding 50 files?

file_list <- list.files("/data/genome/relevantfiles/AL/*NEW*.")  #reading in the list of 50 files in my directory all starting with NEW


df <- file_list  #each file *NEW* is a df #this does not work - how can apply each file into a dataframe.

##code for each dataframe. The code/function is shown more simply below because it works 

#running a function on the dataframe
df_output_results <- coloc.susie(dataset1 = list(beta=df$BETA, varbeta=df$Varbeta, N=1597, type="quant" ...)

#printing out results for each dataframe
final_results <- print(df_output_results$summary)

#outputting results 
write.table(final_results, file = paste0(fileName,"_output.txt"), quote = F, sep = "\t")

I am unsure how to adapt the code so that each file is inputted in a list and the following R codes are applied to each file in the code block and then outputted into a seperate file ... repeated for 50 files. I am assuming I need to use lapply but not sure how to use this? The codes in the code block work so there is not issue with the codes.

HKJ3
  • 387
  • 1
  • 10
  • 1
    Welcome to Stack Overflow! Can you please read and incorporate elements from [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269/1082435). Especially the aspects of using `dput()` for the input and then an explicit example of your expected dataset? – wibeasley Jan 06 '23 at 15:52
  • 2
    Also, emphasize the _minimal_ in minimal reproducible example - much of this code is not useful. Please edit your question to include only the minimal data and code needed to reproduce your issue – jpsmith Jan 06 '23 at 16:07
  • Thanks. I just simply need to know how to run 50 files through my code (which works) and then output into 50 corresponding files. I have simplified my example. – HKJ3 Jan 06 '23 at 16:14

1 Answers1

1

From what I understand you want to import 50 files from a folder and store each file in a list. Then you want to loop a function across that list, then export those results somewhere.

I created an example folder on my desktop ("Desktop/SO Example") and put five CSV files in there. You didn't specify what format your files were in, but you can change the below code to whatever import command you need (see ?read.delim). The data in the CSVs are identical and made using:

ex_df <- data.frame(A = LETTERS[1:5], B = 1:5, C = words[1:5])

And look like this:

A B C
A 1 a
B 2 able
C 3 about
D 4 absolute
E 5 accept

I imported these and stored them in a list using lapply. Then I made a simple example function to loop through each data frame in the list and perform some operation (using lapply). Lastly, I exported those results as a CSV file back in the same folder using sapply.

Hopefully this helps!

# Define file path to desired folder
file_path <- "~/Desktop/SO Example/"

# Get file names in the folder
file_list <- list.files(path = file_path)

# Use lapply() with read.csv (if they are CSV files) to store data in a list
list_data <- lapply(file_list, function(x) read.csv(paste0(file_path, x)))

# Define some function
somefunction <- function(x){
  paste(x[,1], x[,2], x[,3])
}

# Run the function across your list data using lapply()
results <- lapply(list_data, somefunction)

# Output to the same folder using sapply
sapply(1:length(results), function(x) 
  write.csv(results[x], 
            paste0(file_path, "results_output_", x, ".csv"), 
            row.names = FALSE))
jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • Within my function I output my results for each file using this command: results <- print(susie_chr1$summary). How can I ensure that for each file a result is saved and only outputted for that file and then that result is printed in the write.table function? – HKJ3 Jan 08 '23 at 14:59
  • I’m a bit confused - this is that the above code does (just change write.csv to whatever command you are using, ie, write.table). Have you copied and pasted/tried the code I provided? This may help you understand what it is doing – jpsmith Jan 08 '23 at 15:12
  • My function is not one function. I am doing numerous things like resizing my file etc... and then running a R function etc.... Do I essentially include all this code within the curly brackets? – HKJ3 Jan 08 '23 at 17:34
  • Yes, you can define everything in one function then run that function through `lapply` or you can run each individual function through `lapply` – jpsmith Jan 08 '23 at 19:29
  • Where can I specify header = TRUE, for each file. I've tried list_data <- lapply(file_list, function(x) read.txt(paste0(file_path, x) header = TRUE)) but i get an error. – HKJ3 Jan 09 '23 at 14:27
  • you're missing a comma in `read.txt(paste0(file_path, x) header = TRUE))` – jpsmith Jan 09 '23 at 14:29
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/250970/discussion-between-hkj3-and-jpsmith). – HKJ3 Jan 09 '23 at 14:37