I want to search AA*ZZ
only if *
does not contain XX
.
For 2 strings:
"IY**AA**BMDHRPONWUY**ZZ**"
"BV**AA**BDMYB**XX**W**ZZ**CKU"
how can I match regex only with the first one?
I want to search AA*ZZ
only if *
does not contain XX
.
For 2 strings:
"IY**AA**BMDHRPONWUY**ZZ**"
"BV**AA**BDMYB**XX**W**ZZ**CKU"
how can I match regex only with the first one?
If you only want to match characters A-Z, you might use
AA(?:[A-WYZ]|X(?!X))*ZZ
Explanation
AA
Match literally(?:
[A-WYZ]
Match A-Z except X|
orX(?!X)
Match X and assert what is directly to the right is not X)*
Close non capturing group and repeat 0+ timesZZ
Match literallyIf there also can be other characters another option could be to use a negated character class [^\sX]
matching any char except X or a whitespace char:
AA(?:[^\sX]|X(?!X))*ZZ
Another option is to use a tempered greedy token:
AA(?:(?!\btest\b).)*BB
Posting my original comment to the question as an answer
Apart from "single-regex" solutions already posted, think about this solution:
AA
and ZZ
, for example with this regex: AA(.+)ZZ
. Store all matches in a list.XX
. You do not even need to use Regex for that, as most languages, including Python, have dedicated string methods for that. What you get in return is a clean solution, without any complicated Regexes. It's easy to read, easy to maintain, and if any new conditions are to be added they can be applied at the final result.
To support it with some code (you can test it here):
import re
test_str = """
IYAABMDHRPONWUYZZ
BVAABDMYBXXWZZCKU
"""
# First step: find all strings between AA and ZZ
match_results = re.findall("AA(.+)ZZ", test_str, re.I)
# Second step: filter out the ones that contain XX
final_results = [match for match in match_results if not ("XX" in match)]
print(final_results)
As for the part assigned to final_results
, it's called list comprehension. Since it's not part of the question, I'll not explain it here.
My guess is that you might probably, not sure though, want to design an expression similar to:
^(?!.*(?=AA.*XX.*ZZ).*).*AA.*ZZ.*$
import re
regex = r"^(?!.*(?=AA.*XX.*ZZ).*).*AA.*ZZ.*$"
test_str = """
IYAABMDHRPONWUYZZ
BVAABDMYBXXWZZCKU
AABMDHRPONWUYXxXxXxZZ
"""
print(re.findall(regex, test_str, re.M))
['IYAABMDHRPONWUYZZ', 'AABMDHRPONWUYXxXxXxZZ']
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.