2

I current have str.match(/(http[^\s]+)/i) which not only captures link in the content, but also in img tag(src="http...") and anchor tag(href="http...")

How do I modify my regex so that it matches only "http/s" that has no "src=" or "href=" before it?

9blue
  • 4,693
  • 9
  • 29
  • 43

3 Answers3

3

You can use an additional \s. href or src will not have a whitespace character before the URL. In normal text, there is a whitespace.

str.match(/\s(http[^\s]+)/i)

Also see DEMO

MaxZoom
  • 7,619
  • 5
  • 28
  • 44
ByteHamster
  • 4,884
  • 9
  • 38
  • 53
1

You can catch links that don't start with an = nor a quote before the http/s:

str.match(/[^=\"](http[^\s]+)/i)
Dmitry Sadakov
  • 2,128
  • 3
  • 19
  • 34
0

You can overmatch using simple http[^\s]+ (=http\S+).

I'd suggest to use a regex to match text outside of tags, and whitelist those tags where you allow the text to appear. Here is the regex:

/(?![^<]*>|[^<>]*<\/(?!p\b|td|pre))https?:\/\/[a-z0-9&#=.\/\-?_]+/gi

(?!p\b|td|pre) part is where we add whitelisted tags. The regex won't capture http://example.com,.

See demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563