Run command on pair of files (different file types) with matching character string

Question

I have a list of files:

catfish.fa
polar.fa
catfish.ids.txt
polar.ids.txt

I want to run this command for each file with a matching character string. So for example, I'd like to run this:

cat catfish.fa | seqkit grep -f catfish.ids.txt > catfish.output.fa

Similarly...

cat polar.fa | seqkit grep -f polar.ids.txt > polar.output.fa

How can I run this command for each file pair in the directory and in parallel? Thanks for your help!

score 2 · Answer 1 · answered Jan 05 '19 at 07:28

#!/bin/bash

for f in *.fa
do
   filename="${f%.*}"
   if [ -e ${f}.ids.txt ]
   then
      cat ${f}.fa  | seqkit grep -f ${f}.ids.txt >${f}.output.fa
   fi
done

filename="${f%.*}" extracts the filename without extension, see here for an explanation. The purpose of the if is to single out only the files ending with .fa which have a corresponding .ids.txt file. If you want everything to be run in parallel on each pair, append a & at the end of the cat ${f}.fa ... file. (Beware to not generate too many parallel tasks!)

Cyrus · Answer 2 · 2019-01-05T08:11:09.067

1

With bash's Parameter Expansion:

for file in *.fa; do seqkit grep -f "${file%%.*}.id.txt" >"${file%%.*}.output.fa" <"$file" & done

edited Jan 05 '19 at 08:11

answered Jan 05 '19 at 07:26

Cyrus

84,225
14
89
153

Ole Tange · Accepted Answer · 2019-01-05T16:56:57.410

1

This will run one job per CPU core in parallel:

parallel 'cat {} | seqkit grep -f {.}.ids.txt > {.}.output.fa' ::: *fa

May I suggest you run with --dry-run first, so you can see what will be run?

parallel --dry-run 'cat {} | seqkit grep -f {.}.ids.txt > {.}.output.fa' ::: *fa

Also consider spending 20 minutes on reading chapter 1+2 of the book GNU Parallel 2018 (print: http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html online: https://doi.org/10.5281/zenodo.1146014). Your command line will love you for it.

edited Jan 05 '19 at 16:56

answered Jan 05 '19 at 11:07

Ole Tange

31,768
5
86
104

Thank you for your excellent resource. When I tried this the terminal just returned a ">". I am using macOS and the parallel command is installed. – user3105519 Jan 05 '19 at 15:30
End ' was missing. Fixed. – Ole Tange Jan 05 '19 at 16:57
Yep I kept looking at it and was like oh it was missing an '. Thank you for help and your book is excellent. I will spend time learning it. – user3105519 Jan 05 '19 at 18:22

Run command on pair of files (different file types) with matching character string

3 Answers3