54

I know it is quite some weird goal here but for a quick and dirty fix for one of our system we do need to not filter any input and let the corruption go into the system.

My current regex for this is "\^.*"

The problem with that is that it does not match characters as planned ... but for one match it does work. The string that make it not work is ^@jj (basically anything that has ^ ... ).

What would be the best way to not match any characters now ? I was thinking of removing the \  but only doing this will transform the "not" into a "start with" ...

Erick
  • 5,969
  • 10
  • 42
  • 61

9 Answers9

83

The ^ character doesn't mean "not" except inside a character class ([]). If you want to not match anything, you could use a negative lookahead that matches anything: (?!.*).

JSBձոգչ
  • 40,684
  • 18
  • 101
  • 169
  • 2
    Seems to work ! But this construct (?! <= ) is quite weird to me, what does it means exactly ? – Erick May 28 '10 at 15:21
  • 1
    @Erick, read this page for info about lookaround operators: http://www.regular-expressions.info/lookaround.html. – JSBձոգչ May 28 '10 at 15:24
  • @JS Bang just started reading, sounds like something quite advanced but necessary. Thanks for the solution! – Erick May 28 '10 at 15:27
  • This might be more efficient: /(?!a)a/ (Can't believe we're writing regexen to match nothing :)) – pilcrow May 28 '10 at 15:31
  • @pilcrow business reasons ;-) – Erick May 28 '10 at 15:37
  • @Erick, ha! If I may borrow from Pascal, the business has its reasons of which reason knows nothing. :) (Of course, reason alone usually doesn't lead to profit.) – pilcrow May 28 '10 at 15:50
  • 3
    Even more efficient: `(?!)` - the `.*` is optional anyway. – Tim Pietzcker Apr 14 '12 at 10:30
  • In terms of performance, is it a little better to go do the negative lookahead for `.?` instead of `.*` : `(?!.?)` ? – leorleor Jul 21 '14 at 21:43
  • 2
    @leorleor But `(?!.?)` will match an empty string, while the version with `.*` will not. – JSBձոգչ Jul 22 '14 at 13:14
  • @Erick re the earliest "sounds like something quite advanced but necessary" yeah I mean, there's a lot of ways for text to be formatted and a lot of queries and `replace()`s you'd want to do with it I realize this comment is very old :sweat_smile: but I guess future readers can benefit from the discussion – Nathan majicvr.com Oct 03 '22 at 17:33
  • Ooh, these resources will be useful for the typical SO user who just wants a quick (~=5 mins) fix: 1. [generally useful Regex from that tutorial pilcrow linked](https://www.regular-expressions.info/quickstart.html) , 2. [a python use case for fellow pythonistas](https://stackoverflow.com/questions/180986/what-is-the-difference-between-re-search-and-re-match) :) – Nathan majicvr.com Oct 03 '22 at 17:40
62

A simple and cheap regex that will never match anything is to match against something that is simply unmatchable, for example: \b\B.

It's simply impossible for this regex to match, since it's a contradiction.

References

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • 4
    This is the usual solution, more widely-supported than lookaround. – bobince May 28 '10 at 15:30
  • 9
    OK, it's funny that there's a "usual solution" to this. =) – Michael H. May 28 '10 at 18:04
  • 1
    @MichaelH. it's actually really useful. Say, I have a CLI with a "hide regex" parameter to exclude items from output. I want this to be off by default. So I could set the value to be empty by default, and have extra logic to enable the matching only when user provided it ---- or simply set `\b\B` as default value. – Alois Mahdal Jul 26 '19 at 17:49
  • [This solution](https://stackoverflow.com/a/2302992/719276) might be more efficient: `^\b$`. Maybe a mix of the two would also work in emacs `^\b\B$`. – arthur.sw Sep 24 '20 at 11:08
14

Another very well supported and fast pattern that would fail to match anything that is guaranteed to be constant time:

$unmatchable pattern $anything goes here etc.

$ of course indicates the end-of-line. No characters could possibly go after $ so no further state transitions could possibly be made. The additional advantage are that your pattern is intuitive, self-descriptive and readable as well!

fatuhoku
  • 4,815
  • 3
  • 30
  • 70
  • The negative lookahead solution is fine as well: I think as regexp goes, people understand `(?!.*)` a lot better. `$whatever` has a WTF factor to it ;) – fatuhoku Apr 03 '14 at 11:26
  • 1
    This answer is more portable in that it also works in regex engines which don't *have* lookahead, although I think there are some multi-line modes where it can still match. I agree `$whatever` is a strange regex to look at, mostly because it looks like a variable expansion. But the same technique works the other way around - trying to match something before the start of the line: `whatever^`. – sqweek Jan 31 '16 at 09:25
  • $whatever only looks like variable expansions in a limited number of languages like PHP and Perl — that won't be an issue if you're working with a codebase with the other languages. Good alternative! – fatuhoku Jan 31 '16 at 13:06
  • In many implementations of regex `$` only means end when it is at the end of the regex (otherwise it's a literal). – ebyrob Jun 09 '16 at 18:44
8

tldr; The most portable and efficient regex to never match anything is $- (end of line followed by a char)


Impossible regex

The most reliable solution is to create an impossible regex. There are many impossible regexes but not all are as good.

First you want to avoid "lookahead" solutions because some regex engines don't support it.

Then you want to make sure your "impossible regex" is efficient and won't take too much computation steps to match... nothing.

I found that $- has a constant computation time ( O(1) ) and only takes two steps to compute regardless of the size of your text (https://regex101.com/r/yjcs1Z/3).

For comparison:

  • $^ and $. both take 36 steps to compute -> O(1)
  • \b\B takes 1507 steps on my sample and increase with the number of character in your string -> O(n)

Empty regex (alternative solution)

If your regex engine accepts it, the best and simplest regex to never match anything might be: an empty regex .

aeon
  • 83
  • 3
  • 6
1

Instead of trying to not match any characters, why not just match all characters? ^.*$ should do the trick. If you have to not match any characters then try ^\j$ (Assuming of course, that your regular expression engine will not throw an error when you provide it an invalid character class. If it does, try ^()$. A quick test with RegexBuddy suggests that this might work.

Sean Vieira
  • 155,703
  • 32
  • 311
  • 293
  • `^\j$` matches j all by itself. `^()$` matches the empty string. – ebyrob Jun 09 '16 at 18:47
  • Depends on the regex implementation IIRC - some will throw, some will translate it to `j` and some will treat it as a non-existent character class (the engine that powers RegexBuddy, when I wrote this answer, for example). – Sean Vieira Jun 09 '16 at 21:49
0

You want to match nothing at all? Neg lookarounds seems obvious, but can be slow, perhaps ^$ (matches empty string only) as an alternative?

annakata
  • 74,572
  • 17
  • 113
  • 180
0

^ is only not when it's in class (such as [^a-z] meaning anything but a-z). You've turned it into a literal ^ with the backslash.

What you're trying to do is [^]*, but that's not legal. You could try something like

" {10000}"

which would match exactly 10,000 spaces, if that's longer than your maximum input, it should never be matched.

user308405
  • 1,140
  • 8
  • 10
  • You don't say what regular expression variant you're using, make sure it supports {} as a repetition count before trying this. It works in Python. – user308405 May 28 '10 at 15:22
0
((?iLmsux))

Try this, it matches only if the string is empty.

user247702
  • 23,641
  • 15
  • 110
  • 157
Tushar Pal
  • 11
  • 2
0

Interesting ... the most obvious and simple variant:

~^

.

https://regex101.com/r/KhTM1i/1

requiring usually only one computation step (failing directly at the start and being computational expensive only if the matched string begins with a long series of ~) is not mentioned among all the other answers ... for 12 years.

Claudio
  • 7,474
  • 3
  • 18
  • 48