0

I found a collection of words on this website about regular expressions. I tried number 5 and managed to match the opposite of what I need.

These should not be matched:

abba
anallagmatic
bassarisk
...

These should be matched:

acritan
aesthophysiology
amphimictical
...

This pattern matches the inverse:

([a-z])([a-z])\2\1

Unfortunately, I do not know how to negate it. I read about this:

(?!([a-z])([a-z])\2\1)

But it seems that simple nesting does not work. Is nesting of groups supported when using regular expressions?

What could it do?

Xiphias
  • 4,468
  • 4
  • 28
  • 51
  • Don't let the readers need to refer to the linked page. Write what the pattern is that you are dealing with. – sawa Mar 12 '14 at 15:14
  • 1
    I did, didn't I? The reference is just for completeness. – Xiphias Mar 12 '14 at 15:36
  • No, you didn't. You just gave lists without explaining why they match or not. – sawa Mar 12 '14 at 15:38
  • @sawa That's how the problem in question works. You are given two lists and have to figure out how to match one but not the other, using the shortest regex you can find. It's called "regex golf." In this case, it's pretty obvious that you want to match words that *don't* have an `abba` (or `itti`, or whatever) pattern in their letters. – elixenide Mar 12 '14 at 15:49
  • @EdCottrell It is not obvious. But thanks for clarifying. – sawa Mar 12 '14 at 15:51
  • @sawa I didn't mean to offend. I apologize if I did. I just meant to explain that this is a puzzle; the goal is to figure out the rule that applies and either write a short regex applying that rule or, if it's shorter, a regex that ignores the rule but just happens to work. – elixenide Mar 12 '14 at 15:56
  • @EdCottrell I didn't take it as offensive. I just have different opinion from you. With your comments, I came to have a clearer idea, but without that, the question was not clear at all. The OP should have written in the question what you wrote in the comment. – sawa Mar 12 '14 at 15:58
  • @sawa Yes, you are right. Now I got what you mean. If you didn't get it in the first place, it would be quite confusing. Should have mentioned it! – Xiphias Mar 12 '14 at 22:26

2 Answers2

5

Answer to Your Question

You have to get a little fancy with it:

^((?!([a-z])([a-z])\3\2).)+$

Regular expression visualization

Debuggex Demo

Some Hints

Since this is coming from regex golf puzzle 5 at http://regex.alf.nu/, I'll give you a couple of hints.

First, some clarification for readers not familiar with these puzzles: this is a puzzle, based on xkcd comic 1313. It's called "regex golf." You are given two lists and have to figure out how to match all elements in one but none in the other, using the shortest regex you can find. On the website in question, most of the puzzles have a pattern in one of the lists, and the goal is to figure out the rule that applies and either write a short regex applying that rule or, if it's shorter, a regex that ignores the rule but just happens to work. In this case, you want to match words that don't have an abba (or itti, or whatever) pattern in their letters.

Hint 1: This is shorter, because it replaces [a-z] with \S:

^((?!(\S)(\S)\3\2).)+$

Regular expression visualization

Debuggex Demo

Hint 2: While both regexes above work, neither is by any means the shortest working regex for the puzzle. The shortest working regex is a "cheat" in that it doesn't literally match the pattern, but nonetheless discriminates correctly between lists.

Community
  • 1
  • 1
elixenide
  • 44,308
  • 16
  • 74
  • 100
  • This is cool. I just don't get why we need `?:`. I thought that it disables backreferences to that group. Isn't that correct? I interpret it as: "Match any character (`.`) preceded by the pattern (like abba) at least once. If it is true, the word will be skipped due to `?!`." Doesn't `^` require the pattern to be at the beginning? And doesn't `$` require the pattern to be at the end? Is this different here because of the `?:`? – Xiphias Mar 12 '14 at 15:38
  • You're right; we don't need `?:`. Sorry; forgot I was regex golfing for a moment! :) If you take out `?:`, however, you have to change the backrefs from `\2\1` to `\3\2`, because the outermost `(...)` group becomes `\1`. See my edited version above. – elixenide Mar 12 '14 at 15:46
3

This has been answered here:

How to negate the whole regex?

Or another idea is that you can do it in code. For example, in C#..

strin gtext = "abba";
Regex r = new Regex("([a-z])([a-z])\2\1");
if(!r.IsMatch(text))
//Then do something

With ! you are saying the same that if Is Not Match

Community
  • 1
  • 1
Oscar Bralo
  • 1,912
  • 13
  • 12
  • 1
    I think this is the most efficient approach... Trying to "negate" the regex is troublesome (but not impossible as Ed Cottrell's answer attempts to show) because the semantics of matching are along the lines of "I found one definite match to my RE" - finding one match to your "negated RE" does not necessarily prove that there isn't a match to your "positive RE" later in the string. – twalberg Mar 12 '14 at 15:44
  • 2
    This is a clever solution! For the quiz, it's not applicable, but in a real-world scenario this would probably be an elegant solution. – Xiphias Mar 12 '14 at 22:27