0

I thought the community helped me nail this problem w/a case insensitive RegExp, but I got it wrong. What about the following RegEx fails in IE7 and IE8?

var reggy = /(\s*?)<span\b(?:.*?)(?:class=(?:'|"|.*?\s)?foobar(?:\s|\3))(?:.*?)(?:\/)?>(.+?)<\/span>(\s*?)/ig;

jsFiddle here. Only in IE7 and IE8 does it give a "did not match" result.

Community
  • 1
  • 1
buley
  • 28,032
  • 17
  • 85
  • 106
  • just gonna guess: `(?:)` – zzzzBov Nov 09 '11 at 19:46
  • What is it you're trying to do? Perhaps a regular expression isn't the best solution to this. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – tvanfosson Nov 09 '11 at 19:49
  • this looks like a ridiculous regular expression, there is no point in over complicating everything, you should just do this proceduraly. It also looks like you are trying to use regex to identify html, which is _wrong_ http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html – GAgnew Nov 09 '11 at 19:55
  • I've seen that post. I'm not trying to parse an HTML document, I'm trying to pattern match a single HTML node. Do you think I should be using an HTML parser? – buley Nov 09 '11 at 19:55
  • Changing from `(?:class=(?:'|"|.*?\s)?foobar` to `(class=('|"|.*?\s)?foobar` is doing the trick. Still have no clue why. – buley Nov 09 '11 at 20:20

2 Answers2

2

There are several problems with that regex, the worst of them being that you seem to be mixing up capturing and non-capturing groups. As Mike Samuel hinted, the third capturing group is the (\s*?) at the very end (which, like the one at the beginning, served no useful purpose). Try this regex:

/<span\b[^>]*\bclass=\s*(['"]?)forbes_entity\1[^>]*>[\s\S]*?<\/span>/ig

Here there's only the one capturing group; it captures a single-quote, a double-quote, or nothing. After the class name, the \1 matches the same thing again. (I changed the class name to match the sample text in your fiddle.)

It turned out I didn't need any other groups, but if I had needed them I would have used non-capturing groups ( (?:...) ) to make it easier to keep track of the capturing-group numbers. I also used [\s\S] instead of . to match the span's contents, in case it contains any newlines.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • Thanks for the advice. I also tried some variations w/fewer capturing groups, and one interesting thing I noticed was that it seemed there was a limit on the number of capturing groups. Having fewer is likely merited on its own but if it's the case that there's some limit than this is especially true. – buley Nov 10 '11 at 01:36
1

\3 looks suspicious since it can never match anything but the empty string since the third capturing group follows it. Could IE be treating the \3 before the third capturing group as an octal escape, i.e. as equivalent to \u0003?

In older versions of IE, \s had a non-standard meaning -- it did not match \u00A0 for example.

Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
  • Or maybe the older IE's are treating it as an error because it's a forward reference. I think the ECMAScript standard says that it should simply succeed without consuming any characters because the group it references has not yet participated in the match. Maybe IE wasn't following that rule before. – Alan Moore Nov 09 '11 at 23:28
  • 1
    @AlanMoore, I thought the spec says the initial value of a group is blank and they are reset every time a containing repetition is entered, but I guess that arrives at the same conclusion. – Mike Samuel Nov 10 '11 at 00:26