I am trying to use Snakemake to process calls to the rnaQUAST tool with multiple inputs delineated by two sets of different, but paired keywords. I do not want all combinations of these keywords, only specific combinations. It is my understanding that I need to specify the use of zip within the expand() call in my rule all as below. However, in interpreting the variables snakemake appears to populate the {sample} and {reference} wildcards in an unexpected way:
samples_rnaQUAST = ["TMW3250_15","TMW3250_20","TMW3256_15","TMW3256_20","TMW3261_15","TMW3261_20",
"TMW3673_15","TMW3673_20","TMW3285_15","TMW3285_20","TMW3275_15","TMW3275_20",
"TMW3681_15","TMW3681_20","TMW3287_15","TMW3287_20"]
references_rnaQUAST = ["German_ale","German_ale","German_ale","German_ale",
"English_ale","English_ale","American_ale","American_ale",
"Frohberg","Frohberg","Frohberg","Frohberg","Saaz",
"Saaz","Saaz","Saaz"]
rule all:
input:
expand("rnaquast/{sample}{reference}/short_report.txt", zip, sample=samples_rnaQUAST, reference=references_rnaQUAST)
rule rnaQUAST:
input:
transcriptome="trinity/{sample}/default_by_condition_trinity/Trinity.fasta",
reference="genomes/{reference}_genome.fasta",
gtf="genomes/AUGUSTUS_annotations/{reference}.gtf"
output:
report="rnaquast/{sample}{reference}/short_report.txt"
threads: 16
shell:"""
/home/user/miniconda3/envs/rnaquast/share/rnaquast-1.5.1-0/rnaQUAST.py \
--transcripts {input.transcriptome} \
--reference {input.reference} \
--gtf {input.gtf} \
-t {threads} \
-o rnaquast/{output.report}
"""
With snakemake 5.10.0, I am receiving the following output populating {sample} and {reference} wildcards:
Building DAG of jobs...
MissingInputException in line 65 of /home/user/analyses/Snakefile:
Missing input files for rule rnaQUAST:
genomes/e_genome.fasta
trinity/TMW3250_15German_al/default_by_condition_trinity/Trinity.fasta
genomes/AUGUSTUS_annotations/e.gtf
Why is snakemake splitting the inputs in this unexpected way, and misallocating portions of the strings input to wildcards in rule all?