8

I'm attempting to integrate John Gruber's An Improved Liberal, Accurate Regex Pattern for Matching URLs into one of my Javascripts, but WebKit's inspector (in Google Chrome 5.0.375.125 for Mac) gives an "Invalid group" regular expression syntax error.

Gruber's original regexp is as follows:

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

The line from my JavaScript w/the regexp is as follows (w/forward slashes backslash-escaped):

tweet_text = tweet_text.replace(/(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/gi, '<a href="$1">$1</a>');

And the Google Chrome (V8?) error is as follows:

Uncaught SyntaxError: Invalid regular expression: /(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/: Invalid group

And the Safari error is as follows:

SyntaxError: Invalid regular expression: unrecognized character after (?

He claims it should work in modern JavaScript regexp interpreters, which I'd assume WebKit & V8 would be. Does JavaScript's regexp syntax not support the (?: (damn Google for not indexing punctuation!) grouping syntax? Did I just miss escaping something?

morgant
  • 2,135
  • 2
  • 19
  • 28

1 Answers1

15

Gah, it was the mode modifier (i.e. the (?i)) at the beginning of the regex!

I went through Regular-Expressions.info's datails on "JavaScript's Regular Expression Flavor", specifically the list of what's not supported, and there was the 'mode modifier', which I had already specified after the closing forward slash of the regex. Ripped it out an all seems well.

So, my JavaScript regex is now as follows:

/\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/gi
morgant
  • 2,135
  • 2
  • 19
  • 28
  • I'm actually having trouble matching 'example.com'. 'http://example.com' and 'www.example.com' both work. Do you have any ideas? – Samuel Cole Apr 14 '11 at 15:58
  • By removing the \/ at the end of the third option for the domain name, I can make it work: `\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4})(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))` – Samuel Cole Apr 14 '11 at 16:10
  • Ah, I see that Gruber was intentionally missing the example.com case, however, seems like a common one. – Samuel Cole Apr 14 '11 at 16:15
  • I agree that it's a common case and would be useful. That said, I understand Gruber's not wanting to match against specific TLDs for flexibility & forward compatibility and also not wanting it to match `filename.ext`. – morgant Apr 14 '11 at 18:46
  • After working on this over the day, I ended up with: https://gist.github.com/920312 – Samuel Cole Apr 14 '11 at 20:58