I want to split multiple bam files to pre-determined number of smaller bam files. I do not know how to specify the output because the number of smaller bam files is variable depending on which samples I am splitting.
I have read https://bitbucket.org/snakemake/snakemake/issues/865/pre-determined-dynamic-output
I do not see how checkpoint is helping me in my case.
SAMPLE_cluster = { "SampleA" : [ "1", "2", "3" ], "SampleB" : [ "1" ], "SampleC" : [ "1", "2" ] }
rule split_bam:
input: "{sample}.bam"
output: expand("split_bam/{{sample}}_{cluster_id}.bam", cluster_id = ?)
shell:
"""
split_bam {input} {output}
"""
rule index_split_bam:
input: "split_bam/{sample}_{cluster_id}.bam"
output: "split_bam/{sample}_{cluster_id}.bam.bai"
shell:
"""
samtools index {input}
"""
A for loop works for me as in the link above, but the anonymous rule annoys me.
How to specify the output for the split_bam rule? I have read Snakemake: unknown output/input files after splitting by chromosome this works because the number of chromosomes is fixed for a single sample. If there are multiple samples and the number of chromosomes is different for different samples, it will be similar to my problem.