0

I am trying to limit the number of URLs someone can add to a textarea. I have a regex that can find all the URLs just fine. Here is the issue I'm running into:

When two links are right next to each other, but separated by a space, the code I've written will not identify the second link. If I separate them by two spaces, then they will be identified. I have no idea what is going on here.

function limitLinks() {
  var counter = 0;
  var url = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/?)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\)){0,}(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s\!()\[\]{};:\'\"\.\,<>?«»“”‘’]){0,})/ig;
  var sig = $(".textarea").val();
  var sigsplit = sig.replace(/\n/g, " ").split(" ");
  var links = [];
  for (var i = 0; i < sigsplit.length; i++) {
    if (url.test(sigsplit[i])) {
      counter++;
      links.push(sigsplit[i]);
    }
  }
  console.log("Here's the split text:" + sigsplit);
  console.log("All the links are:" + links);
}

limitLinks();
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<form>
  <textarea class="textarea">foo.com bar.com</textarea>
</form>

With one space separating the URLs, it only identifies the first link. If you add a space to separate the URLs by two spaces, it identifies both. I don't think the regex is the problem here - the regex will find both URLs just fine. It has something to do with the array that is created, I think, when I split the textarea value by spaces.

Any insight would be really appreciated... I've spent too much time trying to figure this out!

jakeehoffmann
  • 1,364
  • 14
  • 22

1 Answers1

3

With further research, this question is a duplicate of Why RegExp with global flag in Javascript give wrong results?. But I will keep this here as my notes how I got there.


Your regex seems to have some kind of "memory".

Look at this:

var url = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/?)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\)){0,}(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s\!()\[\]{};:\'\"\.\,<>?«»“”‘’]){0,})/ig;

function isLink(str) {
  return url.test(str);
}

console.log('odd.com even.com odd.com even.com odd.com'.split(' ').filter(isLink));

It returns ["odd.com", "odd.com", "odd.com"].

Now try this:

function isLink(str) {
  var url = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/?)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\)){0,}(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s\!()\[\]{};:\'\"\.\,<>?«»“”‘’]){0,})/ig;
  return url.test(str);
}

console.log('odd.com even.com odd.com even.com odd.com'.split(' ').filter(isLink));

It returns ["odd.com", "even.com", "odd.com", "even.com", "odd.com"].

Fascinating, huh? There's something about your regex that causes it to act differently between executions. By reinstantiating the regex on every function call, it resets that memory. Very strange.

To fix this I removed the g flag at the end.


Here is an even smaller reproducible example:

var re = /a./g;
console.log('a1 a2 a3 a4 a5 a6'.split(' ').filter(s => re.test(s)));
// returns ["a1", "a3", "a5"]

Removing the g flag returns ["a1", "a2", "a3", "a4", "a5", "a6"]

Community
  • 1
  • 1
000
  • 26,951
  • 10
  • 71
  • 101
  • Yes, when you use the `g` flag it makes the regexp keep memory, so you can use it in a loop to find all the matches. Each time you use it it starts from where it ended the previous time. – Barmar Mar 29 '17 at 21:20
  • Ugh, so it WAS a regex problem after all! Sigh. Thank you for your help! I knew it had to be a simple answer... and it was. Thank you! – Raquel Smith Mar 29 '17 at 21:26