-1

For example- I have this text

www.google.com

<a href="www.google.com"> Google Homepage </a>

I wrote this (<a.*<\/a>) which captures anchor tag and this (www\.[\S]+(\b|$)) which selects any text which starts with www. but what i want it selects only www.google.com not the one inside anchor tag.

anything through which I can completely ignore anchor tag and select text only from remaining text.

To be more precise a regex which can: NOT OF (<a.*<\/a>) AND (www\.[\S]+(\b|$))

Hope, I'm clear with my question. Thanks for helping.

Community
  • 1
  • 1

1 Answers1

0

As I understand you want to select each url (starting with an www.) when it is not in the href attribut

This will work with an negative lookbehind

(?<!href=")(www\.[\S]+(\b|$))

This regex will select the url when there is no href=" before it.

Be aware js does not support a negative lookbehind, tested on https://regex101.com/

Edit due to addtitons in the comments: If you want to sort out everything in an html-tag (between before closing >) this should work for you:

(?![^<]*>)(([a-zA-Z0-9\-\_\.])+@[a-zA-Z\_]+?(\.[a-zA-Z]{2,6})+)

It's an negative lookahead saying that it should not match when having an unlimited times not < followed by >

Good thing about negative lookahead is, that it is supported in JS :)

Lars-Olof Kreim
  • 280
  • 1
  • 8