1

I am trying to use a regular expression that identifies URLS. I have taken it from: Check if a Javascript string is a url and its code is:

function ValidURL(str) {
  var pattern = new RegExp('^(https?:\/\/)?'+ // protocol
    '((([a-z\d]([a-z\d-]*[a-z\d])*)\.)+[a-z]{2,}|'+ // domain name
    '((\d{1,3}\.){3}\d{1,3}))'+ // OR ip (v4) address
    '(\:\d+)?(\/[-a-z\d%_.~+]*)*'+ // port and path
    '(\?[;&a-z\d%_.~+=-]*)?'+ // query string
    '(\#[-a-z\d_]*)?$','i'); // fragment locater
  if(!pattern.test(str)) {
    alert("Please enter a valid URL.");
    return false;
  } else {
    return true;
  }
}

whenever i send the following WRONG URL: "http://www.pinevalleyscountrycreations.com/sitebuildercontent/sitebuilderpictures/ .gif"

to go through this code my browser freezes for a few minutes but following the freeze it returns a true value.

any ideas on what is missing in the regex's defenitions for both of the problems? the freeze and the wrong return value?

Thanks in advance!

Community
  • 1
  • 1
user1322801
  • 839
  • 1
  • 12
  • 27
  • Seems like the problem in that regex is that you _need_ to double escape because you're building the regex as string. – elclanrs Jul 07 '13 at 09:04
  • That shouldn't make anything pause, it should throw an error. There are problems with groups, and of course `\d` in a string passed into `new RegExp` is **not** the regex character class for a digit (it's just the letter `d`). As elclanrs points out, it would have to be `\\d` if you have it in a string. – T.J. Crowder Jul 07 '13 at 09:08
  • mmm where should i change the \d into \\d? in the port's line? and would it identify this string as a non valid URL? – user1322801 Jul 07 '13 at 09:11
  • See answer with 10 votes in question you posted, it has the correct regex, with properly escaped characters. – elclanrs Jul 07 '13 at 09:15
  • Thanks but it still identify this string as a valid URL even though of its wrong file name (" .gif") – user1322801 Jul 07 '13 at 09:27
  • Using the regex in the answer @elclanrs pointed to, I can replicate your result (well, nearly) on Chrome using [this page](http://jsbin.com/ohisar/1) (it asks you before starting). How odd. Takes over a minute (with my computer's fan ratcheting up) before it comes back and says false (not true, as you say it does for you). Changing the URL to something else but equal length [does the same thing](http://jsbin.com/ohisar/2), but changing the length without changing the components [doesn't](http://jsbin.com/ohisar/3). Sadly I don't have time right now to pick that regex apart. – T.J. Crowder Jul 07 '13 at 10:22

1 Answers1

1

Change [a-z\d]([a-z\d-]*[a-z\d])* to [a-z\d]([a-z\d-]*[a-z\d])? (notice the last character), and it will run as expected. You also need to escape all the backslashes; Change \d to \\d, \. to \\. and \? to \\?. : and / does not need to be escaped at all.

The problem is that [a-z\d]([a-z\d-]*[a-z\d])* have several ways of matching "pinevalleyscountrycreations" (226 = 67108864 ways). When back-tracking, it will try every possible way of matching the string before giving up. The group and ? are still necessary, since it wouldn't match single characters otherwise.

function ValidURL(str) {
  var pattern = new RegExp('^(https?://)?'+ // protocol
    '((([a-z\\d]([a-z\\d-]*[a-z\\d])?)\\.)+[a-z]{2,}|'+ // domain name
    '((\\d{1,3}\.){3}\\d{1,3}))'+ // OR ip (v4) address
    '(:\\d+)?(/[-a-z\\d%_.~+]*)*'+ // port and path
    '(\\?[;&a-z\\d%_.~+=-]*)?'+ // query string
    '(#[-a-z\\d_]*)?$','i'); // fragment locater
  if(!pattern.test(str)) {
    alert("Please enter a valid URL.");
    return false;
  } else {
    return true;
  }
}
Markus Jarderot
  • 86,735
  • 21
  • 136
  • 138