3

The output of my first command line "bcftools query -l {input.invcf} | head -n 1" prints the name of the first individual of vcf file (i.e. IND1). I want to use that output in selectvariants GATK in -sn IND1 option. How is it possible to integrate the 1st comamnd line in snakemake in order to use it's output in the next one?

rule selectvar:
    input:
        invcf="{family}_my.vcf"
    params:
        ind= ???
        ref="ref.fasta"
    output:
        out="{family}.dn.vcf"
    shell:
        """
        bcftools query -l {input.invcf} | head -n 1 > {params.ind}
        gatk --java-options "-Xms2G -Xmx2g -XX:ParallelGCThreads=2" SelectVariants -R {params.ref} -V {input.invcf} -sn {params.ind} -O {output.out}
        """
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
user3224522
  • 1,119
  • 8
  • 19
  • 1
    Note that you have to be careful in using `head` in snakemake shell scripts. See https://stackoverflow.com/questions/46569236/handling-sigpipe-error-in-snakemake – dariober Feb 04 '22 at 17:40

3 Answers3

3

There are several options, but the easiest one is to store the results into a temporary bash variable:

rule selectvar:
   ...
   shell:
        """
        myparam=$(bcftools query -l {input.invcf} | head -n 1)
        gatk -sn "$myparam" ...
        """

As noted by @dariober, if one modifies pipefail behaviour, there could be unexpected results, see the example in this answer.

SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
2

When I have to do these things I prefer to use run instead of shell, and then shell out only at the end. The reason for this is because it makes it possible for snakemake to lint the run statement, and to exit early if something goes wrong instead of following through with a broken shell command.

rule selectvar:
    input:
        invcf="{family}_my.vcf"
    params:
        ref="ref.fasta"
        gatk_opts='--java-options "-Xms2G -Xmx2g -XX:ParallelGCThreads=2" SelectVariants'
    output:
        out="{family}.dn.vcf"
    run:
        opts = "{params.gatk_opts} -R {params.ref} -V {input.invcf} -O {output.out}"
        sn_parameter = shell("bcftools query -l {input.invcf} | head -n 1")
        # we could add a sanity check here if necessary, before shelling out
        shell(f"gatk {options} {sn_parameter}")
        """
Daniel
  • 11,332
  • 9
  • 44
  • 72
1

I think I found a solution:

rule selectvar:
    input:
        invcf="{family}_my.vcf"
    params:
        ref="ref.fasta"
    output:
        out="{family}.dn.vcf"
    shell:
        """
        gatk --java-options "-Xms2G -Xmx2g -XX:ParallelGCThreads=2" SelectVariants -R {params.ref} -V {input.invcf} -sn `bcftools query -l {input.invcf} | head -n 1` -O {output.out}
        """
user3224522
  • 1,119
  • 8
  • 19
  • 1
    This works, but might be harder to debug due to the nested evaluation of a command... the approach I provide is basically the same, but because you have an intermediate result in a variable you could print it (for debugging/info)... – SultanOrazbayev Feb 04 '22 at 13:01
  • 1
    it works perfectly, thanks, also for the suggestion comment – user3224522 Feb 04 '22 at 13:15