I believe the following demonstrates what you're trying to achieve:
# Snakefile
rule sam_startswith_dna:
output: '{pattern}.sam'
wildcard_constraints: pattern='dna.+'
shell: 'touch {output}'
rule sam_not_startswith_dna:
output: '{pattern}.sam'
wildcard_constraints: pattern='(?!dna).+' # negative lookahead assertion
shell: 'touch {output}'
rule bam_endswith_rna:
output: '{pattern}.bam'
wildcard_constraints: pattern='.+rna'
shell: 'touch {output}'
rule bam_not_endswith_rna:
output: '{pattern}.bam'
wildcard_constraints: pattern='.+(?<!rna)' # negative lookbehind assertion
shell: 'touch {output}'
Using it (snakemake 4.6.0, python 3.6):
$ snakemake -n dna_sample.sam # runs rule: sam_startswith_sam
$ snakemake -n sample.sam # runs rule: sam_not_startswith_sam
$ snakemake -n sample_dna.sam # runs rule: sam_not_startswith_sam
$ snakeamke -n sample_rna.bam # runs rule: bam_endswith_rna
$ snakemake -n sample.bam # runs rule: bam_not_endswith_rna
$ snakemake -n rna_sample.bam # runs rule: bam_not_endswith_rna
Here's what I think you were doing:
# Snakefile2
rule sam_startswith_dna_:
output: '{pattern}.sam'
wildcard_constraints: pattern='dna_.+'
shell: 'touch {output}'
rule sam_not_startswith_dna_:
output: '{pattern}.sam'
wildcard_constraints: pattern='(?!dna)_.+'
shell: 'touch {output}'
Using it:
$ snakemake -s Snakefile2 dna_data.sam # runs rule: sam_startswith_dna_
$ snakemake -s Snakefile2 rna_data.sam # raises MissingRuleException :( :( :(
Here's how you could have fixed it:
# Snakefile3
rule sam_startswith_dna_:
output: '{pattern}.sam'
wildcard_constraints: pattern='dna_.+'
shell: 'touch {output}'
rule sam_not_startswith_dna_:
output: '{pattern}.sam'
wildcard_constraints: pattern='(?!dna)[^_]{3}_.+'
shell: 'touch {output}'
Using it:
$ snakemake -s Snakefile3 -n dna_data.sam # runs rule: sam_startswith_dna_
$ snakemake -s Snakefile3 -n rna_data.sam # runs rule: sam_not_startswith_dna_
But it's not very general because of the hardcoded {3}
:
$ snakemake -s Snakefile3 -n gdna_data.sam # raises MissingRuleException
The following is based on my brief reading of snakemake.io.regex
and some poking around; may contain errors
In general, given a rule like this:
rule some_rule:
output: 'some.{pattern}.txt'
wildcard_constraints: pattern='[a-z_]+'
shell: 'touch {output}'
and a command line invocation like this:
$ snakemake some.tar_get.txt
the rule some_rule
will be executed if
re.search('some\.(?P<pattern>[a-z_]+)\.txt$', 'some.tar_get.txt')
returns a match (assuming other checks pass (eg ambiguity, cyclic dag, etc)).
Interestingly, $
is appended to the pattern, but ^
isn't prepended.
This behavior was different from what I initially thought, which went something like this (this would allow the use of ^
and $
in your wildcard_constraints
):
# python3, pseudo-code-ish
output = 'some.{pattern}.txt'
pattern = '[a-z_]+'
target = 'some.tar_get.txt'
# First test: does the target file name match the output (without the constraint)?
m = re.search('some\.(?P<pattern>.+)\.txt', target)
if not m:
raise MissingInputException
# Second test: does the wildcard satisfy user-supplied constraint?
m = re.search(pattern, m.group('pattern'))
if not m:
raise MissingInputException
run_rule()