1

I have a long list of files with names like: file-typeX-sectorY.tsv, where X and Y get values from 0-100. I process each of those files with an R program, but read them one by one like this: data <- read.table(file='my_info.tsv', sep = '\t', header = TRUE, fill = TRUE) it is impractical. I want to build a bash program that does something like

#!/bin/bash

for i in {0..100..1}
do
         for j in {1..100..1)
         do
                 Rscript program.R < file-type$i-sector$j.tsv
         done

done

My problem is not with the bash script but with the R program. How can I receive the files one by one? I have googled and tried instructions like: args <- commandArgs(TRUE) either data <- commandArgs(trailingOnly = TRUE) but I can't find the way. Could you please help me?

STerliakov
  • 4,983
  • 3
  • 15
  • 37
Xavier
  • 15
  • 3
  • I think the R function `list.files` might be a better way to read in a list of file names, which you can then pass to the function which processes them. – neilfws Oct 09 '22 at 21:54
  • If the question is "How to load many csv files and join them in a dataframe ?" you can do it all with R : [function-to-load-multiple-csv-files-into-single-dataframe](https://stackoverflow.com/questions/23190280/whats-wrong-with-my-function-to-load-multiple-csv-files-into-single-dataframe). For large data consider using packages like `data.table` or a combination of `readr` + `purrr` for efficiency. – cbo Oct 10 '22 at 13:11

1 Answers1

0

At the simplest level your problem may be the (possible accidental ?) redirect you have -- so remove the <.

Then a mininmal R 'program' to take a command-line argument and do something with it would be

#!/usr/bin/env Rscript

args <- commandArgs(trailingOnly = TRUE)
stopifnot("require at least one arg" = length(args) > 0)
cat("We were called with '", args[1], "'\n", sep="")

We use a 'shebang' line and make it chmod 0755 basicScript.R to be runnable. The your shell double loop, reduced here (and correcting one typo) becomes

#!/bin/bash

for i in {0..2..1}; do
    for j in {1..2..1}; do
        ./basicScript.R file-type${i}-sector${j}.tsv
    done
done

and this works as we hope with the inner program reflecting the argument:

$ ./basicCaller.sh 
We were called with 'file-type0-sector1.tsv'
We were called with 'file-type0-sector2.tsv'
We were called with 'file-type1-sector1.tsv'
We were called with 'file-type1-sector2.tsv'
We were called with 'file-type2-sector1.tsv'
We were called with 'file-type2-sector2.tsv'
$ 

Of course, this is horribly inefficient as you have N x M external processes. The two outer loops could be written in R, and instead of calling the script you would call your script-turned-function.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725