I'm struggling to integrate my sample sheet (TSV) into my pipeline. Specifically, I want to define the samples wildcard manually instead of reading it from a patch. The reason is that not all samples in a path are supposed to be analysed. Instead, I made a sample sheet that contains the list of samples, the path where to find, reference genome, etc.
The sheet looks like this:
name path reference
sample1 path/to/fastq/files mm9
sample2 path/to/fastq/files mm9
I load the sheet in my snakefile
:
table_samples = pd.read_table(config["samples"], index_col="name")
SAMPLES = table_samples.index.values.tolist()
The first rule is supposed to merge the FASTQ files inside, so it would be nice to do something like this:
rule merge_fastq:
output: "{sample}/{sample}.fastq.gz"
params: path = table_samples['path'][{sample}]
shell: """
cat {params.path}/*.fastq.gz > {output}
"""
But as written above it won't work because the sample wildcard is not defined. Is there a way I can say the sample list I defined above (SAMPLES) contains all the samples for which rules should be executed?
I honestly feel stupid asking this question but I've already spent a couple of hours finding/searching a solution and at this point I need to be a bit more time efficient :D
Thanks!