1

I'm trying to use a column in a text file to conditionally execute rules in a snakemake workflow.

The text file is as follows:

id  end sample_name fq1 fq2
a   paired  test_paired resources/SRR1945436_1.fastq.gz resources/SRR1945436_2.fastq.gz
b   single  test_single resources/SRR1945436.fastq.gz   NA

For each sample in the text file, if value in end column is paired I would like to use rule cp_fastq_pe and if end is single then I would like to use rule cp_fastq_pe to process the fq1 & fq2 or just fq1 files, respectively.

relevant part of Snakefile is as follows:

import pandas as pd
samples = pd.read_table("config/samples.tsv").set_index("id", drop=False)
all_ids=list(samples["id"])

rule cp_fastq_pe:
    """
    copy file to resources
    """
    input:
        fq1=lambda wildcards: samples.loc[wildcards.id, "fq1"],
        fq2=lambda wildcards: samples.loc[wildcards.id, "fq2"]
    output:
        "resources/fq/{id}_1.fq.gz",
        "resources/fq/{id}_2.fq.gz"
    shell:
        """
        cp {input.fq1} {output[0]}
        cp {input.fq2} {output[1]}
        """

rule cp_fastq_se:
    """
    copy file to resources
    """
    input:
        fq1=lambda wildcards: samples.loc[wildcards.id, "fq1"]
    output:
        "resources/fq/{id}.fq.gz",
    shell:
        """
        cp {input.fq1} {output}
        """

Is it possible to do this?

cjfiscus
  • 35
  • 4

1 Answers1

2

I had a similar problem, which I solved here: How to make Snakemake input optional but not empty?

Here is the idea adjusted to your problem. First, you need to specify the ruleorder to resolve the ambiguity (otherwise the single could always be applied whenever the paired is possible):

ruleorder: cp_fastq_pe > cp_fastq_se

Next, in your cp_fastq_pe rule you need to define a function that either returns a valid file (for the paired case) or returns a placeholder for non-existing file:

rule cp_fastq_pe:
    input:
        fq1=lambda wildcards: samples.loc[wildcards.id, "fq1"],
        fq2=lambda wildcards: samples.loc[wildcards.id, "fq2"] if "fq2" in samples else "non-existing-filename"

This rule would be applied to all samples wherever "fq2" field exists and represents a valid file. The other rule would be selected to the rest of the samples.

Dmitry Kuzminov
  • 6,180
  • 6
  • 18
  • 40
  • Thank you for the idea, I think this solution is on the right track but as it is the function in fq2 input gives a KeyError. To clarify, the goal is to have two versions of a few rules, with one version (e.g. rule_pe) that operates on two files formatted as follows: {id}_1.fq.gz and {id}_2.fq.gz and the other (rule_se) operating on a single file formatted as follows: {id}.fq.gz. Later in the workflow the output of the two versions of a rule will converge on {id}.txt for both, so I don't think it works to have fq2 be written to a random string for {id}.fq.gz case. – cjfiscus Jan 15 '21 at 20:29
  • Whenever the fq2 is defined as a placeholder of non-existing file, this version of the rule cannot be used (and is not used): that is what you requested, right? – Dmitry Kuzminov Jan 15 '21 at 20:53
  • Yes, that's what I'm looking for. The function doesn't work though. It produces KeyError: 0. – cjfiscus Jan 15 '21 at 22:22
  • Thank you for your efforts on this. It now works when only paired sample is included in the samples file. When both paired and single are included receive a multiple of errors including `File path resources/fq//a.fq.gz contains double '/'.` and `WorkflowError (rule cp_fastq_pe, line 72, /path/to/Snakefile): Function did not return str or list of str.` Maybe it's more feasible to implement solution like [this](https://stackoverflow.com/questions/63323670/snakemake-rna-seq-how-can-i-execute-one-subpart-of-a-pipeline-or-another-subp)? – cjfiscus Jan 17 '21 at 01:37
  • That is up to you which solution to prefer. To fix the problem I need to know both contents of the tsv file and full Snakenake script. Anyway, the advise I gave you is a key to solve the problem as you described in your initial question. – Dmitry Kuzminov Jan 17 '21 at 02:50
  • It appears that the problem was on my end. Your solution works. Thanks! – cjfiscus Jan 18 '21 at 20:39