6

Snakemake allows creation of a log for each rule with log parameter that specifies the name of the log file. It is relatively straightforward to pipe results from shell output to this log, but I am not able to figure out a way of logging output of run output (i.e. python script).

One workaround is to save the python code in a script and then run it from the shell, but I wonder if there is another way?

Manavalan Gajapathy
  • 3,900
  • 2
  • 20
  • 43
econ
  • 547
  • 7
  • 22
  • Found this [implementation](https://stackoverflow.com/a/45247442/3998252) in the wild but it does seem ugly. – Manavalan Gajapathy Apr 22 '19 at 15:02
  • @JeeYem: thanks for the quick response and the link! It seems that for now, the best approach is to log through the shell (using python script). – econ Apr 22 '19 at 15:52

2 Answers2

3

I have some rules that use both the log and run directives. In the run directive, I "manually" open and write the log file.

For instance:

rule compute_RPM:
    input:
        counts_table = source_small_RNA_counts,
        summary_table = rules.gather_read_counts_summaries.output.summary_table,
        tags_table = rules.associate_small_type.output.tags_table,
    output:
        RPM_table = OPJ(
            annot_counts_dir,
            "all_{mapped_type}_on_%s" % genome, "{small_type}_RPM.txt"),
    log:
        log = OPJ(log_dir, "compute_RPM_{mapped_type}", "{small_type}.log"),
    benchmark:
        OPJ(log_dir, "compute_RPM_{mapped_type}", "{small_type}_benchmark.txt"),
    run:
        with open(log.log, "w") as logfile:
            logfile.write(f"Reading column counts from {input.counts_table}\n")
            counts_data = pd.read_table(
                input.counts_table,
                index_col="gene")
            logfile.write(f"Reading number of non-structural mappers from {input.summary_table}\n")
            norm = pd.read_table(input.summary_table, index_col=0).loc["non_structural"]
            logfile.write(str(norm))
            logfile.write("Computing counts per million non-structural mappers\n")
            RPM = 1000000 * counts_data / norm
            add_tags_column(RPM, input.tags_table, "small_type").to_csv(output.RPM_table, sep="\t")

For third-party code that writes to stdout, maybe the redirect_stdout context manager could be helpful (found in https://stackoverflow.com/a/40417352/1878788, documented at https://docs.python.org/3/library/contextlib.html#contextlib.redirect_stdout).

Test snakefile, test_run_log.snakefile:

from contextlib import redirect_stdout

rule all:
    input:
        "test_run_log.txt"

rule test_run_log:
    output:
        "test_run_log.txt"
    log:
        "test_run_log.log"
    run:
        with open(log[0], "w") as log_file:
            with redirect_stdout(log_file):
                print(f"Writing result to {output[0]}")
                with open(output[0], "w") as out_file:
                    out_file.write("result\n")

Running it:

$ snakemake -s test_run_log.snakefile

Results:

$ cat test_run_log.log 
Writing result to test_run_log.txt
$ cat test_run_log.txt 
result
bli
  • 7,549
  • 7
  • 48
  • 94
  • This is a really clever solution, but is there a way to get stderr too? When I replace `redirect_stdout` with `redirect_stderr` and raise an exception inside the context manager, I still can't see it in the logs. It's trickier than I expected to redirect stderr inside python. https://stackoverflow.com/questions/1956142/how-to-redirect-stderr-in-python – eric_kernfeld Jul 22 '22 at 13:22
2

My solution was the following. This is usefull both for normal log and logging exceptions with traceback. You can then wrap logger setup in a function to make it more organized. It's not very pretty though. Would be much nicer if snakemake could do it by itself.

import logging

# some stuff

rule logging_test:
    input: 'input.json'
    output: 'output.json'
    log: 'rules_logs/logging_test.log'
    run:
        logger = logging.getLogger('logging_test')
        fh = logging.FileHandler(str(log))
        fh.setLevel(logging.INFO)
        formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
        fh.setFormatter(formatter)
        logger.addHandler(fh)

        try: 
            logger.info('Starting operation!')
            # do something
            with open(str(output), 'w') as f:
                f.write('success!')
            logger.info('Ended!')
        except Exception as e: 
            logger.error(e, exc_info=True)
Daniel Titkov
  • 922
  • 1
  • 9
  • 14
  • Thanks! This is an interesting approach, right now I am resolving to just running scripts in shell and saving the log (as it looks simpler to read), but I can see how your solution is better in cases where scripts are not possible/desirable. – econ Apr 25 '19 at 16:30