-1

I have the following regex expression.

/<img.+src=['"](?P<src>.+?)['"].*>/i

However When I run this on any string that has more than one image in it, it is returning the last image. In fact it is returning the last src occurance regardless if it is an image or not.

And this is because it is selecting from <img to the last next linebreak and not stopping at the end > of the tag.

How can I change my regex to stop at the > of the <img> tag

Take a look at this example

https://regex101.com/r/QNQA72/2

Chris James Champeau
  • 984
  • 2
  • 16
  • 37

2 Answers2

2

Change .* to .*?, and .+ to .+?

  • .* is greedy, which matches as much as possible
  • .*? is reluctant, which matches as little as possible

Same goes for the + versions.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
0

To answer exactly to your final question

How can I change my regex to stop at the > of the tag

you might merely turn .+ into [^>]+:

/<img[^>]+src=['"](?P<src>.+?)['"].*>/i

But it's not a really good solution, because it will make the regex engine work hard.

A better solution is to work in two steps, first selecting entire <img> elements, then looking for src inside.
So if you want to show each of the srcs in your string:

var images = string.match(/<img[^>]+>/ig);
for (img of images) {
    var match = img.match(/src=(["|'])([^'"]*)\1/);
    if (match) { // (avoid error when <img> doesn't contain src)
        console.log(match[2]);
    }
}

Note how we look for both src="..." or src='...', capturing the opening quote by (["|']) then using backreference to ensure the closing quote is the same.

cFreed
  • 4,404
  • 1
  • 23
  • 33