5

I'm trying to use gnu parallel with some basic bioinformatic tools, e.g. lastz. So say I have 10 seqs, and I want to use lastz on all of them, I use:

parallel --dryrun lastz 'pathToFile/seq{}.fa query.fasta --format=text > LASTZ_results_seq{}' ::: {1..10} 

Which works fine and returns:

lastz pathToFile/seq1.fa query.fasta --format=text > LASTZ_results_seq1
lastz pathToFile/seq2.fa query.fasta --format=text > LASTZ_results_seq2
lastz pathToFile/seq3.fa query.fasta --format=text > LASTZ_results_seq3
...
lastz pathToFile/seq10.fa query.fasta --format=text > LASTZ_results_seq10

But ideally I'd like this step to be part of a bash script which takes three command-line arguments, so the number of seqs (eg. 1 to 10) is given in the command-line (with $2 = startValue, $3 = endValue). I thought that changing it to this would work:

parallel --dryrun lastz 'pathToFile/seq{}.fa query.fasta --format=text > LASTZ_results_seq{}' ::: {"$2".."$3"}

but instead, that returns

lastz pathToFile//seq\{\1..\10\} query.fasta --format=text > LASTZ_results_seq\{\1..\10\}

Can anyone please tell me what I'm doing wrong here? It looks like it is interpreting $2 as 1, and $3 as 10, but then fails to treat it as a range of numbers...

gizmo
  • 151
  • 1
  • 7

2 Answers2

8

Bash ranges doesn't accepts variables, see this post:

How do I iterate over a range of numbers defined by variables in Bash?

thus, I suggest you change {$1..$2} to $(seq $1 $2).

By example, see this test script:

$ cat foo
parallel echo ::: {1..3}
parallel echo ::: {$1..$2}
parallel echo ::: $(seq $1 $2)

when called as ./foo 1 3, it produces following output:

1
2
3
{1..3}
1
2
3
Community
  • 1
  • 1
pasaba por aqui
  • 3,446
  • 16
  • 40
  • 3
    `seq $1 $2 | parallel echo` # To avoid problems if the number of arguments is bigger than a shell line can fit – Ole Tange Jul 17 '15 at 11:56
0

This is not what you are asking, but it might be a better solution:

parallel --dryrun lastz {} query.fasta --format=text '>' LASTZ_results_{/.} ::: pathToFile/seq*.fa

If you get Argument list too long try:

printf '%s\n' pathToFile/seq*.fa | parallel --dryrun lastz {} query.fasta --format=text '>' LASTZ_results_{/.} 

This way you do not need to know in advance how many seq*.fa there are.

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • Thank you! I really like this solution, it's much simpler. The only issue I'm getting is that if I have too many seq*.fa in the directory (e.g. >100000), it fails ("Argument list too long"). Is there a way to make it work for any number of seqs? – gizmo Jul 20 '15 at 00:22
  • Oh wait never mind - just noticed your comment on the other solution! – gizmo Jul 20 '15 at 04:47