In a Snakemake workflow, I would like to run a rule without it triggering any of the rules that produce its input.
A sample scenario is as follows: I have a rule A that is costly and produces many output files from input files:
rule A:
input: "{name}.in"
output: "{name}.out"
shell: "touch {input} {output}" #just a dummy, replace with actual costly task
A second rule B takes the output files and uploads them to a server:
rule B:
input: "{name}.out"
output: touch("{name}.up")
shell: "curl -F 'data={input}' http://google.com/upload
The third rule C is just a usual all
rule that acts as the terminal rule to trigger all input ones:
names = ["x1","x2","x3"] # dummy for long list
rule C:
input: expand("{name}.up",name=names)
Assume there was an error in rule B
such that the expensive rule A
completed, but rule B
has not.
I would like to trigger rule B
and rule C
only, in such a way that rule A
is not.
The problem is that for some reason, rule A will always run, despite many x1.out
being present. This shouldn't be the case but it is.
I'm now looking for a Snakemake CLI option that allows me to prevent rule A
from being run.
I could find a CLI option --until
which does exactly the opposite, it runs all rules up to a certain rule. I would like to be able to do the opposite, something like --from
which starts at B
and fails if inputs cannot be found.
I don't know exactly why rule A
gets triggered. The input files have not been updated. Nonetheless A is run (in fact it's much more complicated, the above is simplified a lot).
In short: is there a CLI option that allows me to specify a rule that should be run, including all downstream rules, but none of the upstream rules? Or is this impossible?