272

Is there an NOT operator in Regexes? Like in that string : "(2001) (asdf) (dasd1123_asd 21.01.2011 zqge)(dzqge) name (20019)"

I want to delete all \([0-9a-zA-z _\.\-:]*\) but not the one where it is a year: (2001).

So what the regex should return must be: (2001) name.

NOTE: something like \((?![\d]){4}[0-9a-zA-z _\.\-:]*\) does not work for me (the (20019) somehow also matches...)

Carson
  • 6,105
  • 2
  • 37
  • 45
Sonnenhut
  • 4,073
  • 5
  • 21
  • 21
  • 1
    There is a String like above and I want to regex it up, that the result of the regex is: `(2001) name`. – Sonnenhut Sep 06 '11 at 09:01

4 Answers4

410

Not quite, although generally you can usually use some workaround on one of the forms

  • [^abc], which is character by character not a or b or c,
  • or negative lookahead: a(?!b), which is a not followed by b
  • or negative lookbehind: (?<!a)b, which is b not preceeded by a
Johan Sjöberg
  • 47,929
  • 21
  • 130
  • 148
196

No, there's no direct not operator. At least not the way you hope for.

You can use a zero-width negative lookahead, however:

\((?!2001)[0-9a-zA-z _\.\-:]*\)

The (?!...) part means "only match if the text following (hence: lookahead) this doesn't (hence: negative) match this. But it doesn't actually consume the characters it matches (hence: zero-width).

There are actually 4 combinations of lookarounds with 2 axes:

  • lookbehind / lookahead : specifies if the characters before or after the point are considered
  • positive / negative : specifies if the characters must match or must not match.
ikegami
  • 367,544
  • 15
  • 269
  • 518
Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • 1
    Thank you the ?! is what I was suggesting too, but anyway if I use `\((?![\d]{4})[0-9a-zA-z _\.\-:]+\)` there is still `(20019)` in it – Sonnenhut Sep 06 '11 at 08:58
  • In the edit of your question you put the `{4}` *outside* the lookahead and in this comment you put it *inside*: which one did you try? Also: if you want `(20019)` to match, then you must add the `\)` *inside* your lookahead: `\((?![\d]{4}\))[0-9a-zA-z _\.\-:]+\)` – Joachim Sauer Sep 06 '11 at 09:00
  • With the regex above in your comment, it works. But I don't understand that... I don't get why you escape the following part ``\((?![\d]{4} -->\)<--)[0-9a-zA-z _\.\-:]+\)`` Then there is a bracket not closed, isn't it? – Sonnenhut Sep 06 '11 at 09:10
  • 1
    I escape the closing parenthesis `)` because I want to match the **literal** character `)` (just as you do at the very beginning and the very end of your regex!). Then *after* I matched that, I end the lookahead by using an unescaped `)`. – Joachim Sauer Sep 06 '11 at 09:13
  • Got it. I was a little bit confused by all that characters. Thank you. – Sonnenhut Sep 06 '11 at 09:17
  • Pardon the edit. Already handled in comments, so I reverted it. – ikegami Sep 06 '11 at 17:23
  • if you don't need to test for anything else afterwards then the pure '(?!...)' sequence works quite well. – Alexander Stohr Aug 24 '20 at 09:42
1

You could capture the (2001) part and replace the rest with nothing.

public static string extractYearString(string input) {
    return input.replaceAll(".*\(([0-9]{4})\).*", "$1");
}

var subject = "(2001) (asdf) (dasd1123_asd 21.01.2011 zqge)(dzqge) name (20019)";
var result = extractYearString(subject);
System.out.println(result); // <-- "2001"

.*\(([0-9]{4})\).* means

  • .* match anything
  • \( match a ( character
  • ( begin capture
  • [0-9]{4} any single digit four times
  • ) end capture
  • \) match a ) character
  • .* anything (rest of string)
birgersp
  • 3,909
  • 8
  • 39
  • 79
1

Here is an alternative:

(\(\d{4}\))((?:\s*\([0-9a-zA-z _\.\-:]*\))*)([^()]*)(( ?\([0-9a-zA-z _\.\-:]*\))*)

Repetitive patterns are embedded in a single group with this construction, where the inner group is not a capturing one: ((:?pattern)*), which enable to have control on the group numbers of interrest.

Then you get what you want with: \1\3

lalebarde
  • 1,684
  • 1
  • 21
  • 36