111

I have a python template engine that heavily uses regexp. It uses concatenation like:

re.compile( regexp1 + "|" + regexp2 + "*|" + regexp3 + "+" )

I can modify the individual substrings (regexp1, regexp2 etc).

Is there any small and light expression that matches nothing, which I can use inside a template where I don't want any matches? Unfortunately, sometimes '+' or '*' is appended to the regexp atom so I can't use an empty string - that will raise a "nothing to repeat" error.

Adam Liss
  • 47,594
  • 12
  • 108
  • 150
grigoryvp
  • 40,413
  • 64
  • 174
  • 277
  • 1
    http://stackoverflow.com/questions/1723182/a-regex-that-will-never-be-matched-by-anything – Paul Tomblin Dec 30 '09 at 16:38
  • 5
    Could the title be better worded as "Regular expression to fail to match anything"? Matching nothing implies a successful match of an empty string. – BamaPookie May 15 '18 at 18:03

6 Answers6

159

This shouldn't match anything:

re.compile('$^')

So if you replace regexp1, regexp2 and regexp3 with '$^' it will be impossible to find a match. Unless you are using the multi line mode.


After some tests I found a better solution

re.compile('a^')

It is impossible to match and will fail earlier than the previous solution. You can replace a with any other character and it will always be impossible to match

Nadia Alramli
  • 111,714
  • 37
  • 173
  • 152
  • That will not match anything for sure and is lightweight for regexp engine to process? (don't want my stub regexps to eat a lot of cpu) – grigoryvp Jun 02 '09 at 17:37
  • @Eye of hell. It should be lightweight. This will try to match a line end followed by a line start. Which is impossible in one line. – Nadia Alramli Jun 02 '09 at 17:46
  • 1
    But possible with multiple lines of course (depending on if the flag is enabled) - for a solution that works whether the flag is enabled or not, see my answer. – Peter Boughton Jun 02 '09 at 17:52
  • 24
    The regex "$^" matches the empty string, at least in some implementations. The second one is better. – Roman Starkov Nov 29 '10 at 18:06
  • @romkyns Second one does not match empty string in my call to PyQt4 `QtCore.QRegExp`. So bad, as it would surely have been lighter to execute. – Joël Jan 24 '14 at 13:27
  • This matches an empty string in ruby. – imnotquitejack Jul 07 '17 at 13:10
  • And in Java. Not sure why though. – shmosel Jan 01 '18 at 23:09
  • 1
    I tried using "a^", but this does not work: That string matches itself! Apparently a ^ that's not at the beginning of a regular expression matches a normal ^ character. Instead, I found this to work: "^(?!x)x" – Jean-François Larvoire May 18 '21 at 13:52
  • I'd recommend `re.compile('matchnothing^')` or similar to make it more readable for those of us not versed in regex-fu – beyarkay Sep 23 '22 at 10:33
61

(?!) should always fail to match. It is the zero-width negative look-ahead. If what is in the parentheses matches then the whole match fails. Given that it has nothing in it, it will fail the match for anything (including nothing).

Chas. Owens
  • 64,182
  • 22
  • 135
  • 226
  • 4
    Right, I was just going to post this too. This is the best way, if your language supports lookaheads. Likewise (?=) matches every string. – Brian Carper Jun 02 '09 at 22:06
18

To match an empty string - even in multiline mode - you can use \A\Z, so:

re.compile('\A\Z|\A\Z*|\A\Z+')

The difference is that \A and \Z are start and end of string, whilst ^ and $ these can match start/end of lines, so $^|$^*|$^+ could potentially match a string containing newlines (if the flag is enabled).

And to fail to match anything (even an empty string), simply attempt to find content before the start of the string, e.g:

re.compile('.\A|.\A*|.\A+')

Since no characters can come before \A (by definition), this will always fail to match.

Peter Boughton
  • 110,170
  • 32
  • 120
  • 176
  • Yours looks nicer than mine since I assume it would exit out faster than using end of line. – ShuggyCoUk Jun 02 '09 at 18:00
  • Peter, you use \z (lower-case) while my Python pocket guide tells me the end-of-string assertion is \Z (upper-case)?! – ThomasH Sep 17 '10 at 10:54
  • ThomasH, they both are end of string, but the uppercase version allows a trailing newline whilst the lowercase one does not. – Peter Boughton Sep 17 '10 at 11:01
  • Mh, interesting, I find this nowhere documented. Also, _re.search("boo\z", "fooboo")_ doesn't returns a match object, while _re.search("boo\Z", "fooboo)_ does. Rather, _re.search("boo\z", "foobooz")_ matches, which speaks to the fact that '\z' is simply interpreted as 'z', right?! (This is in Python 2.6). – ThomasH Sep 17 '10 at 12:54
  • Ah sorry, I thought Python was PCRE, but it turns out there's a few differences, and this is one of them. ( See 'Anchors' at http://www.regular-expressions.info/refflavors.html ) – Peter Boughton Sep 17 '10 at 14:07
  • Great. - As this question was somewhat Python-oriented, maybe you want to update your otherwise excellent answer. – ThomasH Sep 17 '10 at 16:17
5

Maybe '.{0}'?

Steef
  • 33,059
  • 4
  • 45
  • 36
3

You could use
\z..
This is the absolute end of string, followed by two of anything

If + or * is tacked on the end this still works refusing to match anything

ShuggyCoUk
  • 36,004
  • 6
  • 77
  • 101
  • 1
    Why *two* of anything? IIRC `\z` doesn't allow trailing newlines, unlike `\Z`, so won't one suffice? Or this a strange defense against `*` (why are you guarding against that?) – mpen Aug 29 '19 at 22:09
  • this felt like the best solution for my rust implementation which didn't support the negative look ahead that i initially preferred. for reasons i don't comprehend `.\A|.\A*|.\A+` actually matched any string. – grenade Feb 24 '21 at 08:00
0

Or, use some list comprehension to remove the useless regexp entries and join to put them all together. Something like:

re.compile('|'.join([x for x in [regexp1, regexp2, ...] if x != None]))

Be sure to add some comments next to that line of code though :-)

Mike Miller
  • 179
  • 2
  • 10