-1

So I have this regex that I designed, but can't seem to exclude links on a page that already have target="_blank" or links that contain <a name="..."> or <a hre="#..."> How would I exclude links with target="_blank" and not add target="_blank" to anchor links?

Find: <a href=(".*)|([^#][^"]*)\\s>(\w.*)(</a>) Replace: <a href=$1 target="_blank"$2$3

user3683976
  • 121
  • 1
  • 7

1 Answers1

-1

Regex is notoriously the wrong tool for this job.

HTML is structured data that regex doesn't understand, which means you run into exactly the sort of issues you're having: for any non-trivial problem, the many allowed variations in HTML structure make it very difficult to parse using string manipulation techniques.

DOM methods are designed for manipulating that sort of data, so use them instead. The following will loop through every <a> tag in the document, exclude those with no href attribute, those whose href begins with '#', or those with a name attribute, and set the 'target' attribute on the rest.

Array.from(document.getElementsByTagName('a')).forEach(function(a) { 
  if (
    a.getAttribute("href") &&
    a.getAttribute("href").indexOf('#') !==0 &&
    a.getAttribute("name") === null
  ) {
    a.setAttribute('target', '_blank'); // on links that already have this attribute this will do nothing 
  }
});

// Just to confirm:
console.log(document.getElementById('container').innerHTML)
<div id="container">
  <a href="http://example.com">test</a>
  <a href="#foo">test2</a>
  <a href="http://example.com" target="_blank">test3</a>
  <a name="foo">test4</a>
</div>
Daniel Beck
  • 20,653
  • 5
  • 38
  • 53
  • I tried the code above and it didn’t work. I’ve done numerous regex in html files without the use of JavaScript. I don’t need to use JavaScript for this. I get the regex to add target=“_blank” for links. However it adds that to links that I already have it. I guess I’ll have to write another regex to just remove the duplicate target=“_blank” – user3683976 Aug 02 '18 at 10:49
  • And then you'll need to expand your regex to handle the tags that include other attributes, and then the ones with the attributes in a different order, and then to avoid accidental matches with content inside attributes that resembles what you're really looking for, and etc etc etc. If you can expand on "it didn't work" then I'd be happy to help; if you want to keep on with regex, feel free, and good luck – Daniel Beck Aug 02 '18 at 15:05