3

I am trying to rename some files in the snakemake pipeline. Let's say I have three files: "FileA.txt", "FileB.txt", "FileC.txt" and I want them renamed according to a dictionary dict = {"A": "0", "B": "1", "C": "2"} to get "RenamedFile0.txt", "RenamedFile1.txt", and "RenamedFile2.txt". How would one write a rule for this?

This is how my pipeline looks like (I've tried with a function but doesn't work)

SAMPLES = ["A", "B", "C"]
RENAMED_SAMPLES = ["0", "1", "2"]

rename = {"0": "A", "1": "B", "2": "C"}

def mapFile(wildcards):
    file = "results/EditedFile" + str(rename[wildcards]) + ".txt"
    return(file)

rule all:
    input:
        "results/Combined.txt"

rule cut:
    input:
        "data/File{sample}.txt"
    output:
        "results/EditedFile{sample}.txt"
    shell:
        "cut -f1 {input} > {output}"

rule rename:
    input:
        mapFile
    output:
        "results/RenamedFile{renamedSample}.txt"
    shell:
        "cp {input} {output}"


rule combine:
    input:
        expand("results/RenamedFile{renamedSample}.txt", renamedSample = RENAMED_SAMPLES)
    output:
        "results/Combined.txt"
    shell:
        "cat {input} > {output}"

I get the following error:

KeyError: ['2']
Wildcards:
renamedSample=2

Thanks!!!

SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
zzabaa
  • 86
  • 5

1 Answers1

4

When running a custom expansion, the names of wildcards should be specified:

def mapFile(wildcards):
    file = "results/EditedFile" + rename[wildcards.renamedSample] + ".txt"
    return(file)

In this specific case, it's also possible to integrate the logic in the rule itself:

rule rename:
    input:
        lambda wildcards: f"results/EditedFile{rename[wildcards.renamedSample]}.txt"
    output:
        "results/RenamedFile{renamedSample}.txt"
    shell:
        "cp {input} {output}"
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
  • 2
    A few other points, though this answers the question. Generally you don't want to remove files by yourself. Clearly the rule is called rename, so the expectation is you would call `mv` instead of `cp`, but that would cause problems with the DAG. If you have a giant file, `cp` will be slow so consider using symbolic links instead. Finally, the best solution is to call your output what you want to begin with (rule cut here). I know that's not possible for some tools with pre-defined outputs, but in those cases put your `mv` with the rule that generates the outputs. – Troy Comi Aug 01 '22 at 13:18