0

My variable htmlContent contains a string with content in properly formatted HTML code, it contains multiple img tags.

I'm trying to get all the values of the images src attribute and put them in an array srcList.

However, the problem I'm facing with the following code is that it always alerts an array with only 1 source value in it, while I have it set up to push all the source URLs to the array and not just one.

let srcList = [],
    imgSrc,
    regex = /<img.*src="(.*?)"/gi;

while ((imgSrc = regex.exec(htmlContent)) !== null) {
    srcList.push(imgSrc[1]);
}

alert(JSON.stringify(srcList));

How can I make this work as I expect it to?

Kid Diamond
  • 2,232
  • 8
  • 37
  • 79
  • Have you used `str.match(regexp)`? It will return an array containing the entire match result and any parentheses-captured matched results; null if there were no matches. – Daniel Reinoso May 17 '17 at 13:08

1 Answers1

4

The .* between img and src in your regex is greedily matching everything from the first img to the last src -- your code is otherwise essentially correct, but will only catch the last image source for that reason.

Example with that greedy match replaced with something that will only match within each <img> tag (if, as noted in comments, the html string is a single line):

let srcList = [],
  imgSrc,
  regex = /<img[^>]*src="(.*?)"/gi;

let htmlContent = '<img src="foo"><img class="bat" src="bar"><img src="baz">';


while (imgSrc = regex.exec(htmlContent)) {
  srcList.push(imgSrc[1]);
}

console.log(srcList);

(I feel compelled to insert the obligatory "don't try to parse HTML with regular expressions" warning here. Consider something like $('img').each(function() {srcList.push($(this).attr('src'))} instead.)

Community
  • 1
  • 1
Daniel Beck
  • 20,653
  • 5
  • 38
  • 53
  • Right only if document source code is within a whole line. – revo May 17 '17 at 13:22
  • True that. All the more reason to not use regex when parsing html. – Daniel Beck May 17 '17 at 13:23
  • There is no problem using regular expressions here. All that elements and attributes, I don't call it HTML parsing. – revo May 17 '17 at 13:25
  • Well, it's HTML, and we're parsing it, so... ¯\\_(ツ)_/¯ YMMV but pretty much every time I've thought to myself "oh, this is just a simple match, I can use a regex" I've wound up regretting it. For one-off throwaway code, sure, fine; for anything that I'm going to have to maintain it's usually better to do it the right way in the first place. – Daniel Beck May 17 '17 at 13:34
  • You see it HTML, I see it a raw text. Regular expressions are not that regular any more so as a powerful text-processing tool, in such cases, there wouldn't be any problem if you know what you are going to do, indeed. – revo May 17 '17 at 13:57
  • Well, hey, I don't agree; but to each his own, you do you, different strokes, etc – Daniel Beck May 17 '17 at 14:03
  • Sure but then you wouldn't disagree. – revo May 17 '17 at 18:49