Regular expression syntax for "match nothing"?

Question

I have a python template engine that heavily uses regexp. It uses concatenation like:

re.compile( regexp1 + "|" + regexp2 + "*|" + regexp3 + "+" )

I can modify the individual substrings (regexp1, regexp2 etc).

Is there any small and light expression that matches nothing, which I can use inside a template where I don't want any matches? Unfortunately, sometimes '+' or '*' is appended to the regexp atom so I can't use an empty string - that will raise a "nothing to repeat" error.

http://stackoverflow.com/questions/1723182/a-regex-that-will-never-be-matched-by-anything — Paul Tomblin, Dec 30 '09 at 16:38
Could the title be better worded as "Regular expression to fail to match anything"? Matching nothing implies a successful match of an empty string. — BamaPookie, May 15 '18 at 18:03

Nadia Alramli · Accepted Answer · 2009-06-02T17:51:51.603

159

This shouldn't match anything:

re.compile('$^')

So if you replace regexp1, regexp2 and regexp3 with '$^' it will be impossible to find a match. Unless you are using the multi line mode.

After some tests I found a better solution

re.compile('a^')

It is impossible to match and will fail earlier than the previous solution. You can replace a with any other character and it will always be impossible to match

edited Jun 02 '09 at 17:51

answered Jun 02 '09 at 17:34

Nadia Alramli

111,714
37
173
152

That will not match anything for sure and is lightweight for regexp engine to process? (don't want my stub regexps to eat a lot of cpu) – grigoryvp Jun 02 '09 at 17:37
@Eye of hell. It should be lightweight. This will try to match a line end followed by a line start. Which is impossible in one line. – Nadia Alramli Jun 02 '09 at 17:46
1

But possible with multiple lines of course (depending on if the flag is enabled) - for a solution that works whether the flag is enabled or not, see my answer. – Peter Boughton Jun 02 '09 at 17:52
24

The regex "$^" matches the empty string, at least in some implementations. The second one is better. – Roman Starkov Nov 29 '10 at 18:06
@romkyns Second one does not match empty string in my call to PyQt4 `QtCore.QRegExp`. So bad, as it would surely have been lighter to execute. – Joël Jan 24 '14 at 13:27
This matches an empty string in ruby. – imnotquitejack Jul 07 '17 at 13:10
And in Java. Not sure why though. – shmosel Jan 01 '18 at 23:09
1

I tried using "a^", but this does not work: That string matches itself! Apparently a ^ that's not at the beginning of a regular expression matches a normal ^ character. Instead, I found this to work: "^(?!x)x" – Jean-François Larvoire May 18 '21 at 13:52
I'd recommend `re.compile('matchnothing^')` or similar to make it more readable for those of us not versed in regex-fu – beyarkay Sep 23 '22 at 10:33

Chas. Owens · Answer 2 · 2009-06-02T22:13:24.933

61

(?!) should always fail to match. It is the zero-width negative look-ahead. If what is in the parentheses matches then the whole match fails. Given that it has nothing in it, it will fail the match for anything (including nothing).

edited Jun 02 '09 at 22:13

answered Jun 02 '09 at 22:02

Chas. Owens

64,182
22
135
226

4

Right, I was just going to post this too. This is the best way, if your language supports lookaheads. Likewise (?=) matches every string. – Brian Carper Jun 02 '09 at 22:06

Peter Boughton · Answer 3 · 2010-09-17T17:15:45.583

18

To match an empty string - even in multiline mode - you can use \A\Z, so:

re.compile('\A\Z|\A\Z*|\A\Z+')

The difference is that \A and \Z are start and end of string, whilst ^ and $ these can match start/end of lines, so $^|$^*|$^+ could potentially match a string containing newlines (if the flag is enabled).

And to fail to match anything (even an empty string), simply attempt to find content before the start of the string, e.g:

re.compile('.\A|.\A*|.\A+')

Since no characters can come before \A (by definition), this will always fail to match.

edited Sep 17 '10 at 17:15

answered Jun 02 '09 at 17:45

Peter Boughton

110,170
32
120
176

Yours looks nicer than mine since I assume it would exit out faster than using end of line. – ShuggyCoUk Jun 02 '09 at 18:00
Peter, you use \z (lower-case) while my Python pocket guide tells me the end-of-string assertion is \Z (upper-case)?! – ThomasH Sep 17 '10 at 10:54
ThomasH, they both are end of string, but the uppercase version allows a trailing newline whilst the lowercase one does not. – Peter Boughton Sep 17 '10 at 11:01
Mh, interesting, I find this nowhere documented. Also, _re.search("boo\z", "fooboo")_ doesn't returns a match object, while _re.search("boo\Z", "fooboo)_ does. Rather, _re.search("boo\z", "foobooz")_ matches, which speaks to the fact that '\z' is simply interpreted as 'z', right?! (This is in Python 2.6). – ThomasH Sep 17 '10 at 12:54
Ah sorry, I thought Python was PCRE, but it turns out there's a few differences, and this is one of them. ( See 'Anchors' at http://www.regular-expressions.info/refflavors.html ) – Peter Boughton Sep 17 '10 at 14:07
Great. - As this question was somewhat Python-oriented, maybe you want to update your otherwise excellent answer. – ThomasH Sep 17 '10 at 16:17

score 5 · Answer 4 · answered Jun 02 '09 at 17:34

5

Maybe '.{0}'?

answered Jun 02 '09 at 17:34

Steef

33,059
4
45
36

This would match an empty string. – Gras Double Jan 20 '23 at 23:37

score 3 · Answer 5 · answered Jun 02 '09 at 17:58

3

You could use
\z..
This is the absolute end of string, followed by two of anything

If + or * is tacked on the end this still works refusing to match anything

answered Jun 02 '09 at 17:58

ShuggyCoUk

36,004
6
77
101

1

Why *two* of anything? IIRC `\z` doesn't allow trailing newlines, unlike `\Z`, so won't one suffice? Or this a strange defense against `*` (why are you guarding against that?) – mpen Aug 29 '19 at 22:09
this felt like the best solution for my rust implementation which didn't support the negative look ahead that i initially preferred. for reasons i don't comprehend `.\A|.\A*|.\A+` actually matched any string. – grenade Feb 24 '21 at 08:00

score 0 · Answer 6 · answered Jun 02 '09 at 21:56

Or, use some list comprehension to remove the useless regexp entries and join to put them all together. Something like:

re.compile('|'.join([x for x in [regexp1, regexp2, ...] if x != None]))

Be sure to add some comments next to that line of code though :-)

Regular expression syntax for "match nothing"?

6 Answers6

Linked