Within my snakemake pipeline I'm trying to retrieve the correct wildcards. I've looked into wildcard_constraints and this post and this post, however I can't figure out the exact solution.
Here's an example of file names within 2 datasets. 1 dataset contains paired mouse RNAseq read files and another dataset contains human paired RNAseq read files.
"Mus_musculus" dataset is "PRJNA362883_GSE93946_SRP097621" with file names:
"SRR5195524_GSM2465521_KrasT_45649_NoDox_Mus_musculus_RNA-Seq_1.fastq.gz" "SRR5195524_GSM2465521_KrasT_45649_NoDox_Mus_musculus_RNA-Seq_2.fastq.gz"
"Homo_sapiens" dataset is "PRJNA362883_GSE93946_SRP097621" with file names:
"SRR7942395_GSM3406786_sAML_Control_1_Homo_sapiens_RNA-Seq_1.fastq.gz" "SRR7942395_GSM3406786_sAML_Control_1_Homo_sapiens_RNA-Seq_2.fastq.gz"
I would like glob_wildcards to spit out the following wildcards
> ['PRJNA362883_GSE93946_SRP097621', 'PRJNA362883_GSE93946_SRP097621',
> 'PRJNA493818_GSE120639_SRP162872', 'PRJNA493818_GSE120639_SRP162872']
> ['SRR5195524_GSM2465521_KrasT_45649_NoDox',
> 'SRR5195524_GSM2465521_KrasT_45649_NoDox',
> 'SRR7942395_GSM3406786_sAML_Control_1',
> 'SRR7942395_GSM3406786_sAML_Control_1'] ['Mus_musculus', 'Mus_musculus',
> 'Homo_sapiens', 'Homo_sapiens'] ['1', '2', '1', '2']
I've tried the following code:
> import glob import os
>
> DATASET,SAMPLE,SPECIES,FRR,
> =glob_wildcards(config["project_path"]+"resources/raw_datasets/{dataset}/{sample}_{species}_RNA-Seq_{frr}.fastq.gz")
> print(DATASET,SAMPLE,SPECIES,FRR)
However, I get this as output. The many underscores messes up the glob_wildcards
> ['PRJNA362883_GSE93946_SRP097621', 'PRJNA362883_GSE93946_SRP097621',
> 'PRJNA493818_GSE120639_SRP162872', 'PRJNA493818_GSE120639_SRP162872']
> ['SRR5195524_GSM2465521_KrasT_45649_NoDox_Mus',
> 'SRR5195524_GSM2465521_KrasT_45649_NoDox_Mus',
> 'SRR7942395_GSM3406786_sAML_Control_1_Homo',
> 'SRR7942395_GSM3406786_sAML_Control_1_Homo'] ['musculus', 'musculus',
> 'sapiens', 'sapiens'] ['1', '2', '1', '2']
I for example tried this, but the output stays the same: wildcard_constraints: species = '!(sapiens)'
Could anyone suggest the correct code to get the wanted wildcards? Thanks in advance!