3

I have this situation(Java code): 1) a string such as : "A wild adventure" should match. 2) a string with adjacent repeated words: "A wild wild adventure" shouldn't match.

With this regular expression: .* \b(\w+)\b\s*\1\b.* i can match strings containing adjacent repeated words.

How to reverse the situation i.e how to match strings which do not contain adjacent repeat words

nash
  • 33
  • 1
  • 4

1 Answers1

6

Use negative lookahead assertion, (?!pattern).

    String[] tests = {
        "A wild adventure",      // true
        "A wild wild adventure"  // false
    };
    for (String test : tests) {
        System.out.println(test.matches("(?!.*\\b(\\w+)\\s\\1\\b).*"));
    }

Explanation courtesy of Rick Measham's explain.pl:

REGEX: (?!.*\b(\w+)\s\1\b).*
NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      \w+                      word characters (a-z, A-Z, 0-9, _) (1
                               or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))

See also

Related questions


Note

Negative assertions only make sense when there are also other patterns that you want to positively match (see examples above). Otherwise, you can just use boolean complement operator ! to negate matches with whatever pattern you were using before.

String[] tests = {
    "A wild adventure",      // true
    "A wild wild adventure"  // false
};
for (String test : tests) {
    System.out.println(!test.matches(".*\\b(\\w+)\\s\\1\\b.*"));
}
Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • 1
    wow, thanks! You answered my question in few minutes. Kudos. I was trying out negative lookahead to just \b(\w+)\b\s*\1\b and to \1\, that's why wasn't getting the required results. thanks again. – nash May 21 '10 at 03:21
  • @nash: you're welcome. Also, I just realized that I accidentally swithed it to `\s` instead of `\b\s*`; you'd want to use just `\s+` instead. – polygenelubricants May 21 '10 at 03:31