1

Does such regular expressions exist? To clarify my problem, here is an example:

ab would match abbacaba, and the regex I would like to find would match abbacaba.

I can see from experimenting that the following regex (?!(?=a)ab)(?!(?<=a)b). does satisfy the expression above, but I would have to rewrite most of the code if the starting regex (i.e the one that I would like to not match) changed from ab to, say, abc. Do shorter and easier-to-port alternatives to this exist?

shawn_xu
  • 137
  • 2
  • 1
    Hi Shawn! Welcome to StackOverflow! – Mark Aug 02 '23 at 04:42
  • surely you could create these regexes programatically e.g. you make a regex "foo", then create ^foo – Mark Aug 02 '23 at 04:42
  • `.` would already match all single characters not matched by `ab` because `ab` doesn't match a single character. – CAustin Aug 02 '23 at 04:45
  • I guess your current regex could be simplified to [`(?!ab).(?<!ab)`](https://regex101.com/r/5Z6JJd/1) to get adjacent matches, you can put it into a group and repeat: [`(?:(?!ab).(?<!ab))+`](https://regex101.com/r/5Z6JJd/2). Bear in mind that in something like `(?=a)ab` the lookahead `(?=a)` is redundant. What else should there be, besides `a` if `ab` matches? – bobble bubble Aug 03 '23 at 01:27

3 Answers3

2

You want the best regex trick (+ (*SKIP)(*FAIL)) and a tempered greedy quantifier:

(w\hatev[e3]rY0uD()nt\Want\toMatch)
(*SKIP)(*FAIL)
|
(?:(?!(?1)).)+

(?1) is a recursive pattern which matches the expression enclosed in the first group. You can replace it with the expression itself if your flavor does not support recursion. The same goes with (*SKIP)(*FAIL): Use whatever you have in your language to forfeit the match if the first group is not null or similar.

This has some particular advantages over splitting the string with w\hatev[e3]rY0uD()nt\Want\toMatch. For example, (ab)(*SKIP)(*FAIL)|(?:(?!(?1)).)+ matches the following:

abbacaba
ababcdeab

Try it on regex101.com.

Since + only matches 1 or more characters, there is no filtering needed. On the other hand, your language's equivalent of .split(), if any (I'm looking at you, Lua), will typically return even the empty strings. Take Python for example:

import re

print(re.split('ab', 'ababcdeab'))  # ['', '', 'cde', '']

If you want to match single characters, simply drop the quantifier:

(w\hatev[e3]rY0uD()nt\Want\toMatch)
(*SKIP)(*FAIL)
|
(?!(?1)).

Try it on regex101.com.

On the other hand, this trick may not worth it. Just do a split and filter out anything you don't want, for your colleagues' sake.

InSync
  • 4,851
  • 4
  • 8
  • 30
  • 1
    It's a nice answer but I'm doing hard to find where PCRE is mentioned. However THE TRICK could still be used with checking for captures on program-side using e.g. [`ab|((?:(?!ab).)+)`](https://regex101.com/r/YDozuQ/1) (`ab` needs to written twice sadly). – bobble bubble Aug 03 '23 at 01:09
  • 1
    @bobblebubble This has no flavor tag, so I'm assuming it's a generic regex question where all flavors are in play. – InSync Aug 03 '23 at 01:13
  • I didn't mean to criticize but add that THE TRICK can also be used without PCRE verbs. We don't konw regex flavor. :) – bobble bubble Aug 03 '23 at 01:17
  • 1
    @bobblebubble In fact, I did :) It is right there in the second paragraph. – InSync Aug 03 '23 at 01:18
  • 1
    Sorry, I haven't read all the text, you're right! It's anyway a nice elaborate answer and usage of THE TRICK. It's late (early) here, please understand haha. – bobble bubble Aug 03 '23 at 01:21
1

One possible idea is to split the string with the regex. Then the matched substrings are discarded and remaining fragments are output. An example with Python:

import re

s = 'abbacaba'                  # original string
m = re.split('ab', s)           # split the string on 'ab'
l = [x for x in m if x]         # drop empty elements
print(l)

Output:

['bac', 'a']
InSync
  • 4,851
  • 4
  • 8
  • 30
tshiono
  • 21,248
  • 2
  • 14
  • 22
0

You can use regex of the following structure:

(?<=^|abc)(?:(?!abc).)+

Here:

  • (?<=^|abc) lookbehind that requires match to start either in the beginning of the line or after abc,
  • (?!abc) prevents our match from containing abc.

Notice that if your pattern can be any regex, you might need additional considerations before using it (for example, backreferences might cause troubles).

Additionally, since initial pattern is used inside lookbehind, this solution might not be supported by your engine (depending on initial pattern).
For example, quantifier of non-fixed length (inside of lookbehind) aren't supported by PCRE or Python, and unlimited quantifiers (like +, * or {2,}) by Java.

Demo here.

markalex
  • 8,623
  • 2
  • 7
  • 32
  • If this answer is wrong in any way, it would be nice to know why – markalex Aug 02 '23 at 09:20
  • The OP asked for a regex pattern that accepts another regex pattern as input. Your solution seems to be designed only for literal string input. Suppose OP wanted to find everything that a more complex pattern such as `^.+(?>abc)$` doesn't find, for example. It's an unanswerable question because there's no single regex pattern that could accomplish this. – CAustin Aug 02 '23 at 17:06
  • @CAustin, how come it doesn't find, if it [does](https://regex101.com/r/eUGtIN/1)? I agree that it was probably a good idea to mention lookbehinds and unlimited quantifiers, but I don't see how this makes answer "not useful" or wrong. – markalex Aug 02 '23 at 17:23