7

I would have thought this problem to be impossible; as far as I know, Javascript's regex flavor has niether recursive interpolation nor the nifty .NET balancing groups feature. Yet there it is, as problem 12 on regex.alf.nu: match balanced pairs of < and >. Unless there's some other pattern in the sets I'm not getting.

So... is this possible? If so, how?

NOTES:

  1. I know that this is impossible for true regular expressions, but based on the challenge it seems that it must be possible in Javascript's flavor (which is at least irregular enough to have backreferences). I just don't know of any feature that would let them do this.

  2. No other code - the form allows entry of a single regex, which is evaluated against the test strings on the page. I could try to crack the page to break out of the regex and into raw JS, I suppose, but that doesn't seem to be in the spirit of this challenge.

Since David asked, here are the test strings. Longer ones have been truncated with a character count, but the problem is entitled "Balance" and the ones that are complete certainly support the hypothesis that the "match" column has balanced pairs of < and > while the "not" column doesn't.

Match all of these…

<<<<<>><<>>><<... [62 chars]
<<<<<>><>><<><... [110 chars]
<<<<<>><>><>><... [102 chars]
<<<<<>><>>>><<... [88 chars]
<<<<<>>><<<>><... [58 chars]
<<<<<>>><<><>>... [152 chars]
<<<<<>>><<>><<... [42 chars]
<<<<<>>><>><<<>>>><<>>
<<<<<>>>><<<<>... [102 chars]
<<<<<>>>><<<><... [30 chars]
<<<<<>>>><><<<... [66 chars]
<<<<<>>>><><<<... [124 chars]
<<<<<>>>><>><<>>
<<<<><<>>><<<>... [34 chars]
<<<<>><<<>>>><... [92 chars]
<<<<>>><<<<>><>><<<>>>>>
<<<<>>><<<><<>>><><<>>>><<>>
<<<<>>><<><<<>... [84 chars]
<<<<>>>><<<><<... [52 chars]
<<<><<<>>>><<<... [50 chars]
<<<><<><>>>>
<<<><>><<<>>>>
<<<>><<<><<>>>... [44 chars]
<<<>><><<<><>>... [48 chars]
<<<>>><<><<<<>>>><<><<<>>>>>
<<><<<<>><>>>>... [60 chars]
<<>>
<<>><<<<<>>>>>... [54 chars]
<<>><<<<>><<<>... [74 chars]
<>
<><>

and none of these…

<
<<<<<<>>><<><>>>>>><<>
<<<<<>>><>>><<<>>>><>>
<<<<<>>>>>>
<<<<>><<<<<><<>><><<<<
<<<>><<<<><><><><
<<<>>>><><<<><>
<<><<<<><<><<>>><<
<<><<<>>>>><<
<<>>><<<>>
<><<<>><<>>><<>
<><<>>><<<><>><<<>>><<>>>><
<><<>>><><<<>
<><>><>>><><<<... [36 chars]
<>><><<<><>
<>>>>>><<<>><<>><><
<>>>>>>><<<
>
><
><<<>><><<<><<
><<<>>>><><<<<><>>><<><><<
><<><<<<><<<<>>>><
><><><<<>>>>>
><><>>><>><>
><><>>>><>>>>>>><>>><>>
><>><<<<<>>
><>><><><<>><<>>><<
><>>><>>>>><<><<<><>><>><<<
>><<<><<<<<<><>><<
>><>>><<<><>>><><<>><<><><<
>>>><>><>>>><>>><>><><
>>>>><<<>>>
Mark Reed
  • 91,912
  • 16
  • 138
  • 175
  • 1
    Are you allowed to use other JavaScript code? If so, just implement a stack of 'start' matches then unwind them as you find the 'end' matches. – David-SkyMesh Dec 22 '13 at 06:19
  • 1
    The language you describe is not regular. – 000 Dec 22 '13 at 06:27
  • How about you paste the set of [matche these] & [but not these] into your question? – David-SkyMesh Dec 22 '13 at 06:30
  • 1
    @JoeFrambach: True, but JavaScript "regular expressions", despite the name, do allow you to express certain non-regular languages. For example, the language given by `/^(a+)b\1$/` is not regular. – ruakh Dec 22 '13 at 06:31
  • For those of us not familiar with the challenge, can you fix up the example to not include the [] stuff that isn't supposed to get matched? – brandonscript Dec 22 '13 at 06:43
  • Maybe this will help [Checking for balanced brackets in JavaScript](http://codereview.stackexchange.com/questions/14532/checking-for-balanced-brackets-in-javascript) – hwnd Dec 22 '13 at 06:45

1 Answers1

7

I do not believe that this is possible in JavaScript, though it's hard to prove. For example, Java and PHP do not have the features that you mention (recursive interpolation, balancing groups), but this fascinating Stack Overflow answer shows how to match anbn using regexes in those languages. (Adapting that answer to the present case, the Java regex ^(?:(?:<(?=<*(\1?+>)))+\1)*$ should work. Correction: no, it's not that easily adapted.) But that answer depends on Java's support for the possessive quantifier ?+ (like ? except that you can't backtrack into it), and JavaScript doesn't have that.

That said, you can solve the referenced puzzle by writing this:

^(?:<(?:<(?:<(?:<(?:<(?:<(?:<>)*>)*>)*>)*>)*>)*>)*$

which matches up to seven levels of nesting. That's the most that any of the strings has, so it's all you need. (Several of the other puzzles at that page advise you to cheat because they're asking for something technically impossible; so while an elegant solution would obviously be more appealing, there's no reason to assume that one exists.)

Community
  • 1
  • 1
ruakh
  • 175,680
  • 26
  • 273
  • 307
  • I read the source code of the page and all it checks for is if the regex works. As you've said, nothing suggests that a more elegant answer exists nor would it award higher points. – Rhyono Dec 22 '13 at 06:53
  • Java also cannot do bracket matching. We need recursive regex (subroutine) syntax, which Java does not support (AFAIK). – nhahtdh Dec 26 '13 at 01:20