0

I want to split multiple bam files to pre-determined number of smaller bam files. I do not know how to specify the output because the number of smaller bam files is variable depending on which samples I am splitting.

I have read https://bitbucket.org/snakemake/snakemake/issues/865/pre-determined-dynamic-output

I do not see how checkpoint is helping me in my case.

SAMPLE_cluster = { "SampleA" : [ "1", "2", "3" ], "SampleB" :  [ "1" ], "SampleC" : [ "1", "2" ] }

rule split_bam:
    input: "{sample}.bam"
    output: expand("split_bam/{{sample}}_{cluster_id}.bam", cluster_id = ?)
    shell:
       """
       split_bam {input} {output}
       """
rule index_split_bam:
    input: "split_bam/{sample}_{cluster_id}.bam"
    output: "split_bam/{sample}_{cluster_id}.bam.bai"
    shell:
        """
        samtools index {input}
        """

A for loop works for me as in the link above, but the anonymous rule annoys me.

How to specify the output for the split_bam rule? I have read Snakemake: unknown output/input files after splitting by chromosome this works because the number of chromosomes is fixed for a single sample. If there are multiple samples and the number of chromosomes is different for different samples, it will be similar to my problem.

crazyhottommy
  • 310
  • 1
  • 3
  • 8
  • Try to think in the opposite direction. What do you need as the final result? What is needed for this final result, etc.? – Dmitry Kuzminov Jun 18 '19 at 03:04
  • in this case, my final results are just the split bam and the index. This is for a single cell RNAseq bam and I need to split the merged bam by clusters. I have a work around by using touch files as the output of split_bam, it is not the best solution that I like. – crazyhottommy Jun 18 '19 at 06:34
  • 1
    Indeed, if the clusters are known in advance, using a for loop creating a rule for each of the samples is the easiest way. And yes, no way to get proper names for such rules for now. Checkpoints would work as well. The key would be to have a directory as output of the split_bam rule. See the clustering example here: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#data-dependent-conditional-execution – Johannes Köster Jun 18 '19 at 18:24
  • Thanks! I now see how checkpoint can help here now. Now the touch trick works for me and seems to be simpler than the checkpoint way. I will stick with it for now and may find other use cases for checkpoint later. – crazyhottommy Jun 19 '19 at 19:12

0 Answers0