10

I'm trying to use Snakemake rules within a loop so that the rule takes the output of the previous iteration as input. Is that possible and if yes how can I do that?

Here is my example

  1. Setup the test data
mkdir -p test
echo "SampleA" > test/SampleA.txt
echo "SampleB" > test/SampleB.txt
  1. Snakemake
SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        # Output of the final loop
        expand("loop3/{sample}.txt", sample = SAMPLES)


#### LOOP ####
for i in list(range(1, 4)):
    # Setup prefix for input
    if i == 1:
        prefix = "test"
    else:
        prefix = "loop%s" % str(i-1)

    # Setup prefix for output
    opref =  "loop%s" % str(i)

    # Rule
    rule loop_rule:
        input:
            prefix+"/{sample}.txt"
        output:
            prefix+"/{sample}.txt"
            #expand("loop{i}/{sample}.txt", i = i, sample = wildcards.sample)
        params:
            add=prefix
        shell:
            "awk '{{print $0, {params.add}}}' {input} > {output}"

Trying to run the example yields the ERROR CreateRuleException in line 26 of /Users/fabiangrammes/Desktop/Projects/snake_loop/Snakefile: The name loop_rule is already used by another rule. If anyone spots an option to get that thing to work it would be great!

Thanks !

merv
  • 67,214
  • 13
  • 180
  • 245
Fabian_G
  • 431
  • 4
  • 16

2 Answers2

9

I think this is a nice opportunity to use recursive programming. Rather than explicitly including conditionals for every iteration, write a single rule that transitions from iteration (n-1) to n. So, something along these lines:

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        expand("loop3/{sample}.txt", sample=SAMPLES)

def recurse_sample(wcs):
    n = int(wcs.n)
    if n == 1:
        return "test/%s.txt" % wcs.sample
    elif n > 1:
        return "loop%d/%s.txt" % (n-1, wcs.sample)
    else:
        raise ValueError("loop numbers must be 1 or greater: received %s" % wcs.n)

rule loop_n:
    input: recurse_sample
    output: "loop{n}/{sample}.txt"
    wildcard_constraints:
        sample="[^/]+",
        n="[0-9]+"
    shell:
        """
        awk -v loop='loop{wildcards.n}' '{{print $0, loop}}' {input} > {output}
        """

As @RussHyde said, you need to be proactive about ensuring no infinite loops are triggered. To this end, we ensure all cases are covered in recurse_sample and use wildcard_constraints to make sure the matching is precise.

merv
  • 67,214
  • 13
  • 180
  • 245
6

My understanding is that your rules are converted to python code before they are ran and that all the raw python code present in your Snakefile is ran sequentially during this process. Think of it as your snakemake rules being evaluated as python functions.

But there's a constraint that any rule can only be evaluated to a function once.

You can have if/else expressions and differentially evaluate a rule (once) based on config values etc, but you can't evaluate a rule multiple times.

I'm not really sure how to rewrite your Snakefile to achieve what you want. Is there a real example that you could give where looping constructs appear to be required?

--- Edit

For fixed number of iterations, it may be possible to use an input-function to run the rule several times. (I would caution against doing this though, be extremely careful to disallow infinite loops)

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        # Output of the final loop
        expand("loop3/{sample}.txt", sample = SAMPLES)

def looper_input(wildcards):
    # could be written more cleanly with a dictionary
    if (wildcards["prefix"] == "loop0"):
        input = "test/{}.txt".format(wildcards["sample"])
    else if (wildcards["prefix"] == "loop1"):
        input = "loop0/{}.txt".format(wildcards["sample"])
    ...
    return input


rule looper:
    input:
            looper_input
    output:
            "{prefix}/{sample}.txt"
    params:
            # ? should this be add="{prefix}" ?
            add=prefix
    shell:
            "awk '{{print $0, {params.add}}}' {input} > {output}"
Russ Hyde
  • 2,154
  • 12
  • 21
  • Thanks for the input Russ. My real world example iterative estimation of SNP effects. Which I have to do iterative. Does anyone know if its possible to assign a rule name via a function - that could be a possible solution for me – Fabian_G May 23 '19 at 12:45
  • Is there a reason you can't define a loop within the `run`/`shell` of the rule? – Russ Hyde May 23 '19 at 12:50
  • Maybe that can work, not sure. In practice I have 4-5 separate rules. I’ll try it tonight. Traveling right now – Fabian_G May 23 '19 at 12:54
  • Is it a fixed number of iterations, or is it an iterate-to-convergence problem. In the former case, you could use an input-function to permit iteration. I'll try and add some code. – Russ Hyde May 23 '19 at 13:01
  • Thanks for the effort Russ! I'm going to accept merv's answer since it's a bit more elegant, but I really appreciate the help – Fabian_G May 23 '19 at 20:33
  • No probs, but please please make sure you get the recursion correct. An infinite loop will stuff your hard drive – Russ Hyde May 23 '19 at 21:25