2

I am absolute beginner to snakemake. I am building a pipeline as I learn. My question is if the Snakefile is placed with data file that I want to process an NameError: occurs but if I move the Snakefile to a parent directory and edit the path information of input: and output: the code works. what am I missing?

rule sra_convert:
    input:
        "rna/{id}.sra"
    output:
        "rna/fastq/{id}.fastq"
    shell:
        "fastq-dump {input} -O {output}"

above code works fine when I run with

snakemake -p rna/fastq/SRR873382.fastq

However, if I move the file to "rna" directory where the SRR873382.sra file is and edit the code as below

rule sra_convert:
    input:
        "{id}.sra"
    output:
        "fastq/{id}.fastq"
    message:
        "Converting from {id}.sra to {id}.fastq"
    shell:
        "fastq-dump {input} -O {output}"

and run

snakemake -p fastq/SRR873382.fastq

I get the following error

Building DAG of jobs...
Job counts:
    count   jobs
    1   sra_convert
    1
RuleException in line 7 of /home/sarc/Data/rna/Snakefile:
NameError: The name 'id' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}

Solution

rule sra_convert:
    input:
        "{id}.sra"
    output:
        "fastq/{id}.fastq"
    message:
        "Converting from {wildcards.id}.sra to {wildcards.id}.fastq"
    shell:
        "fastq-dump {input} -O {output}"

above code runs fine without error

YoungP
  • 31
  • 5
  • Not familiar with snakemake but I would try to use "./{id}.sra" to see if it's working – Wisthler May 17 '19 at 10:48
  • That didn't work. same error message – YoungP May 17 '19 at 11:23
  • What is your snakemake and OS versions? Your code works fine on Mac with snakemake `v5.4.0`. Btw, try replacing wildcard term `id` with something else; my thinking is that `id` may be a [bad variable name to use](https://stackoverflow.com/a/77612/3998252). – Manavalan Gajapathy May 17 '19 at 14:57
  • @YoungP Are you sure you're not leaving something else out? This is the type of error one sees when they try to use `{id}` in the `shell` string. – merv May 17 '19 at 18:11
  • Is is an exact copy/paste of your code, without manual modifications? I suspect that could be because of misprints in your code that we don't see if this is not an exact match. Next, please provide the exact filename of the one that has xxxx placeholders. There could be a problem if the filename contains spaces, etc. – Dmitry Kuzminov May 17 '19 at 19:32
  • @JeeYem I am trying this on a centOS linux v7. The version of my snakemake is `v5.4.0`. I did try changing the name to `sample` and it is the same. if the `id` is not a good term it would also apply to the first example so I don't think this is the issue. @Dmitry Kuzminov sorry I'll correct the xxxx in the code but it's just numbers and the code is a exact copy paste. – YoungP May 20 '19 at 04:03
  • I have to say sorry @Dmitry Kuzminov. turns out I missed one line from the second code which had message: line. (I edited the post) but why do one need to reference it as `{wildcards.id}` instead of `{id}` ? – YoungP May 21 '19 at 06:32
  • The wildcards are used just in input/output for matching. Other parts of the rule have to access many variables (for example, `{input}` and `{output}` in your example). If the wildcards would be used in the same namespace there could be name collisions. – Dmitry Kuzminov May 21 '19 at 06:40
  • @YoungP It looks you solved your issue. It be preferable to leave you question without the corrected version and instead post that code as an answer to your own question. At which point, feel free to accept your own answer. – merv May 22 '19 at 01:25

1 Answers1

0

I believe that the best source that answers your actual question is:

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#wildcards

If the rule’s output matches a requested file, the substrings matched by the wildcards are propagated to the input files and to the variable wildcards, that is here also used in the shell command. The wildcards object can be accessed in the same way as input and output, which is described above.

Dmitry Kuzminov
  • 6,180
  • 6
  • 18
  • 40