9

I notice that my backup rsync script spends quite some time copying stuff with random name from .snakemake/metadata folders.

What are those files used for?

Can I safely erase them after a snakemake run has completed, or are they necessary for snakemake to correctly perform the next run?

More generally, is there some documentation about the files that snakemake creates in the .snakemake folder?

bli
  • 7,549
  • 7
  • 48
  • 94

2 Answers2

9

From this comment by Johannes Koster, creator of Snakemake:

[The .snakemake/ directory] is used to track (a) the value of the version keyword for each file, (b) the rule implementation for each file, in order to notify the user if something has changed when snakemake is invoked with --summary.

From a related comment on the Google Group:

In general, it is safe to delete the entire .snakemake directory if there is no running Snakemake instance and you are sure that all existing output files are complete. It only contains data provenance information (e.g., to track code input file or parameter changes [to determine if the workflow should be re-run]). You might want to keep .snakemake/conda, since it contains the conda environments used in your workflow.

Edit: To automatically remove the .snakemake/ directory upon successful execution of the pipeline, the onssuccess hook can be used:

import shutil
onsuccess:
    shutil.rmtree(".snakemake")
tomkinsc
  • 1,033
  • 9
  • 11
  • Thanks for the answer. Would it work / be safe to put some cleaning code in `onsuccess` to remove some of the things in `.snakemake`? – bli Aug 10 '17 at 13:20
  • 1
    I believe that should be ok as long as you don't care about state persistence. It would end up being something like: ```import shutil```; then ```onsuccess: shutil.rmtree(".snakemake")```. You could also chain it to your snakemake invocation: `snakemake --snakefile mysnakefile && rm -r ./.snakemake/` – tomkinsc Aug 10 '17 at 16:54
  • but there is already a flag `[--cleanup-metadata FILE [FILE ...]] [--cleanup-shadow]` in [snakemake command line documentation](https://snakemake.readthedocs.io/en/stable/executing/cli.html?highlight=--cleanup-metadata%2F--cm) – Anu Sep 12 '20 at 03:25
1

Old question now and not really answering it... Since you mention rsync, you can skip .snakemake directories with the --exclude option, like:

rsync ... --exclude='.snakemake' source/ dest/
dariober
  • 8,240
  • 3
  • 30
  • 47