1

I know how to run a subscript in shell on all files of a similar type. I do:

for filePath in path/*.extension; do
    script.py $filepath
done

Currently I have about nine pairs of files with the same extension and very similar base names (think xxx_R1 and xxx_R2). I have a script I want to run that takes in pairs of files. How can I run a script on all those pairs using shell?

Lorikiki
  • 11
  • 1
  • Please don't post answers as comments. – chepner Aug 13 '21 at 01:08
  • https://stackoverflow.com/questions/28725333/looping-over-pairs-of-values-in-bash has solutions for Bash; some of them are portable to any Bourne-compatible shell. See also [Difference between sh and bash](https://stackoverflow.com/questions/5725296/difference-between-sh-and-bash) – tripleee Aug 14 '21 at 16:15
  • Does this answer your question? [Looping over pairs of values in bash](https://stackoverflow.com/questions/28725333/looping-over-pairs-of-values-in-bash) – tripleee Aug 14 '21 at 16:32

1 Answers1

-1

I would list the files matching one pattern, strip off the suffix to form a list of the "base" names, then re-append both suffixes. Something like this:

for base in $(ls *_R1 | sed 's/_R1$//')
do
    f1=${base}_R1
    f2=${base}_R2
    script2.py $f1 $f2
done

Alternatively, you could accomplish the same thing by letting sed do the selection as well as the stripping:

for base in $(ls | sed -n '/_R1$/s///p')
    ...

Both of these are somewhat simplistic, and can fall down if you have files with "funny" names, such as embedded spaces. If that's a possibility for you, you can use some more sophisticated (albeit less obvious) techniques to get around them. Several are mentioned in links @tripleee has posted. An incremental improvement I like to use, to avoid the improper word splitting that a for ... in loop can do, is to use while and read instead:

ls | sed -n '/_R1$/s///p' | while read base
do
    f1=${base}_R1
    f2=${base}_R2
    script2.py "$f1" "$f2"
done

This still isn't perfect, and will fall down if you happen to have a file with a newline in its name (although personally, I believe that if you have a file with a newline in its name, you deserve whatever miseries befall you :-) ).

Again, if you want something perfect, see the links posted elsewhere.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • That's a [useless use of `ls`](http://www.iki.fi/era/unix/award.html#ls) altogether. – tripleee Aug 14 '21 at 16:07
  • Thanks for the update, but `ls` to list files is *inherently* useless, and also [buggy](http://mywiki.wooledge.org/ParsingLs). The proper solution is simply `for file in *_R1` and then use the parameter expansion `base=${file%_R1}` inside the loop (or `sed` or `basename` if you really insist on using an expensive external process for something the shell can do easily with built-in facilities). – tripleee Aug 14 '21 at 16:27
  • @tripleee I see your points, and I'm not going to get into an argument about any of this, but I do believe that (a) files with newlines in their names are such an abomination that they're not worth trying to program around (and if I ran the world, I'd disallow them :-) ), and (b) worrying about "using an expensive external process" is silly. No one should feel guilty about using something that's straightforward and that they can remember. Firing up processes and doing grunt work (even if less than 100% efficiently) is what computers are *for*. – Steve Summit Aug 14 '21 at 17:11
  • Oh, and I reject the notion that "`ls` to list files is inherently useless"! If that's true, we should throw every Unix and Linux machine on the planet into a big bonfire, and go back to using OS/360 or something. – Steve Summit Aug 14 '21 at 17:53