6

I am trying to extract the img and src from a long html string.

I know there are a lot of questions about how to do this, but I have tried and gotten the wrong result. My question is just about contradicting results though.

I am using:

var url = "<img height=\"100\" src=\"data:image/png;base64,testurlhere\" width=\"200\"></img>";
var regexp = /<img[^>]+src\s*=\s*['"]([^'"]+)['"][^>]*>/g;
var src = url.match(regexp);

But this results in src not being extracted properly. I keep getting src =<img height="100" src="data:image/png;base64,testurlhere" width="200"></img> instead of data:image/png;base64,testurlhere

However, when I try this on the regex tester at regex101, it extracts the src correctly. What am I doing wrong? Is match() the wrong function to use>

llams48
  • 387
  • 2
  • 7
  • 16

4 Answers4

21

If you need to get the whole img tags for some reason:

const imgTags = html.match(/<img [^>]*src="[^"]*"[^>]*>/gm);

then you can extract the source link for every img tag in array like this:

const sources = html.match(/<img [^>]*src="[^"]*"[^>]*>/gm)
                          .map(x => x.replace(/.*src="([^"]*)".*/, '$1'));
Vi0nik
  • 325
  • 2
  • 9
5

Not a big fan of using regex to parse html content, so here goes the longer way

var url = "<img height=\"100\" src=\"data:image/png;base64,testurlhere\" width=\"200\"></img>";
var tmp = document.createElement('div');
tmp.innerHTML = url;
var src = tmp.querySelector('img').getAttribute('src');
snippet.log(src)
<!-- Provides the `snippet` object, see http://meta.stackexchange.com/a/242144/134069 -->
<script src="http://tjcrowder.github.io/simple-snippets-console/snippet.js"></script>
Arun P Johny
  • 384,651
  • 66
  • 527
  • 531
  • OP, I gave you the literal answer to your question; but this here is what you would be advised to be doing instead. – Amadan Jul 22 '15 at 04:27
1

Try this:

var match = regexp.exec(url);
var src = match[1];
Amadan
  • 191,408
  • 23
  • 240
  • 301
  • Thanks, this works too. Just wondering, why does match[0] return the original string and match[1] return the substring that we are actually looking for? Is it always the case that the 2nd element in the resulting array will be the desired result? – llams48 Jul 23 '15 at 03:35
  • @llams48: `match[1]` is the 1st capture group, `match[2]` is the second... and `match[0]` is the full match. – Amadan Jul 23 '15 at 03:44
1
const src = url.slice(url.indexOf("src")).split('"')[1]

Regex gives me headaches. Boohoo.

Find the index of the src in the HTML string (named var url in the question), then slice it from there, and finally split the array from the " 's. The second item in the array is your src link.

  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/32046344) – Aaron Meese Jun 22 '22 at 22:32