1

I saw in this forum an answare close to my "request" but not enough (Regexp to capture string between delimiters).

My question is: I have an HTML page and I would get only the src of all "img" tags of this page and put them in one array without using cheerio (I'm using node js).

The problem is that i would prefer to exclude the delimiters. How could i resolve this problem?

budi
  • 6,351
  • 10
  • 55
  • 80
Davide Modesto
  • 225
  • 3
  • 9

1 Answers1

0

Yes this is possible with regex, but it would be much easier (and probably faster but don't quote me on that) to use a native DOM method. Let's start with the regex approach. We can use a capture group to easily parse the src of an img tag:

var html = `test<div>hello</div>
<img src="first">
<img class="test" src="second" data-lang="en">
test
<img src="third" >`;
var srcs = [];
html.replace(/<img[^<>]*src=['"](.*?)['"][^<>]*>/gm, (m, $1) => { srcs.push($1) })

console.log(srcs);

However, the better way would be to use getElementsByTagName:
(note the following will get some kind of parent domain url since the srcs are relative/fake but you get the idea)

var srcs = [].slice.call(document.getElementsByTagName('img')).map(img => img.src);

console.log(srcs);
test<div>hello</div>
<img src="first">
<img class="test" src="second" data-lang="en">
test
<img src="third" >
Damon
  • 4,216
  • 2
  • 17
  • 27
  • Ty very much at all. I'm not using cheerio because i think it's much slower. I get an HTML page with request module then i just want to extract all the src of every images. – Davide Modesto Jun 17 '17 at 07:27