2

I have some different configurations and I need to get the combination of them all to run a python script

versions = ['lg', 'sm']
start_time = ['0', '1']
end_time = ['2']

What I want is snakemake to do this for me:

python my_script.py -v lg -s 0 -e 2 > lg_0_2.out
python my_script.py -v lg -s 1 -e 2 > lg_1_2.out
python my_script.py -v sm -s 0 -e 2 > sm_0_2.out
python my_script.py -v sm -s 1 -e 2 > sm_1_2.out

but I can't seem to figure out how to do this in snakemake. Any ideas?

j sad
  • 1,055
  • 9
  • 16

1 Answers1

3

Snakemake has an expand() method that is shorthand for expanding by an outer product, which is the operation you are describing. Typically, this would be accomplished by generating the output file strings as the input in the first rule (default rule), and then providing a rule (myrule below) that parses such strings to generate the command you would use to generate the outputs. In code, it would go something like

Snakefile

versions = ['lg', 'sm']
start_time = ['0', '1']
end_time = ['2']

rule all:
    input:
        expand("{version}_{start}_{end}.out", 
               version=versions, start=start_time, end=end_time)

rule myrule:
    output: "{version,[^_]+}_{start,[0-9]+}_{end,[0-9]+}.out"
    shell:
    """
    python my_script.py -v {wildcards.version} -s {wildcards.start} -e {wildcards.end} > {output}
    """

Running snakemake in the directory where this Snakefile resides would then generate the desired files.

merv
  • 67,214
  • 13
  • 180
  • 245
  • This works, but in my version I need to have the part in `rule all` as the `input`, and I have my shell command as a simple string. One question I have, though, is why does it need to be in two separate rules? Why can't I just put what's in the `all` rule as the `output` of the `my_rule`? Seems redundant to have both, but when I tried that, snakemake doesn't recognize the parts of the `wildcards` object. – j sad Nov 25 '19 at 17:42
  • 1
    @jsad sorry about that, I was doing it from memory and forgot the `input:`. Snakemake is sometimes described as a *pull*-based mechanism. It either gets a list of desired outputs from the commandline (you request `snakemake {lg,sm}_{0,1}_2.out`) or from the `input` of the first rule (in which case one just runs `snakemake`). Outputs of rules are then used to match (and parse) the received file strings. In a standard pipeline, you will typically have many rules and only need to list the final outputs that are not dependencies of anything else. – merv Nov 25 '19 at 18:17
  • Thanks, @merv, for that explanation and the helpful answer! – j sad Nov 26 '19 at 17:15