8

I have a workflow that begins by downloading files from a public database, and then in subsequent steps processes these files to create several aggregated data tables.

I’m testing the workflow on a machine with no internet connection. I ran the preliminary data download steps on another machine and copied them over to this machine, and now I’m trying to run the rest of the workflow. When I run snakemake -np it reports that all of the data download jobs still need to be completed, even though the target files already exist. I’ve even marked these files as ancient() in the subsequent processing rules, but this doesn’t help.

How can I convince Snakemake that these jobs don’t need to be re-run?

Daniel Standage
  • 8,136
  • 19
  • 69
  • 116
  • 3
    Try flag `--reason` ([`Print the reason for each executed rule.`](https://snakemake.readthedocs.io/en/stable/executable.html#OUTPUT)) to figure out why snakemake wants to run it. – Manavalan Gajapathy Sep 19 '18 at 20:33
  • Brilliant. Dependent file with newer timestamp. `touch`ing the downloaded files solved the issue! @JeeYeem, if you post your comment as an answer I'll be happy to give you credit! – Daniel Standage Sep 20 '18 at 13:23
  • Thinking again, `ancient()` should have solved your problem, right? Any clue why not? – Manavalan Gajapathy Sep 20 '18 at 17:59
  • @JeeYem I may have erroneously marked the target files as `ancient()`, rather than the file they depended on. – Daniel Standage Sep 21 '18 at 01:56

1 Answers1

11

Flag --reason prints the reason for each executed rule.

Manavalan Gajapathy
  • 3,900
  • 2
  • 20
  • 43