-1

I have 100,000s of files that I wish to iterate the below sed command over:

sed -s -i -e 's/[[:space:]].*//' -e '1 s/^/>/g' -e '3 s/|*//g' -e '3 s/^/>ref/g' -e '1h;2H;1,2d;4G'

So far, I have been using a bash loop:

for i in read_* ; do
    sed -s -i -e 's/[[:space:]].*//' -e '1 s/^/>/g' -e '3 s/|*//g' -e '3 s/^/>ref/g' -e '1h;2H;1,2d;4G' $i
    mv $i $i.fasta
done

How can I use GNU Parallel to speed this up?

ls read_* > list.read.txt
parallel -j $cores -a list.read.txt sed -s -i -e 's/[[:space:]].*//' -e '1 s/^/>/g' -e '3 s/|*//g' -e '3 s/^/>ref/g' -e '1h;2H;1,2d;4G' []

I tried the above method where I create a list of files to iterate over and perform 10 jobs at once, however I get sed related error commands.

SaltedPork
  • 345
  • 3
  • 16
  • 1
    Interesting problem but you forgot to include the most important bit of information ... *"I get sed related error "* ... What are they? Please add those to your question. Good luck. – shellter Feb 03 '23 at 18:43

1 Answers1

3

Try

parallel -q -v -j "$cores" -a list.read.txt sed -s -i -e 's/[[:space:]].*//' -e '1 s/^/>/g' -e '3 s/|*//g' -e '3 s/^/>ref/g' -e '1h;2H;1,2d;4G'
  • The -q option is necessary to quote special characters (spaces, >, ...) in the command arguments.
  • The [] was causing the code to break when I tested it, so I removed it. I don't know what it was supposed to do.
  • I added quotes to "$cores" because variable expansions should almost always be quoted. See When to wrap quotes around a shell variable?. Use Shellcheck to find missing quotes, and many other shell code errors.
pjh
  • 6,388
  • 2
  • 16
  • 17