Take out image tag off the string and put it in an array

Question

Need to deal with a very strange string response. I need to take out all the image tag from that string and put them in an array so I can iterate through the array so I can render the images

The sample string

var str = '<p>↵   This is the cap you unscrew to open when you refuel your car↵</p>↵↵<p>↵ New line↵</p>↵↵<p>↵ <img alt="blah" src="https://www.imgone.com/wp-content/uploads/2011/04/Tyre-Illustration-500.jpg" />↵</p>Random Text <img alt="blah" src="https://www.imgtwo.com/wp-content/uploads/2011/04/Tyre-Illustration-500.jpg" />'

The expected result can be

['<img alt="blah" src="https://www.imgone.com/wp-content/uploads/2011/04/Tyre-Illustration-500.jpg" />', '<img alt="blah" src="https://www.imgtwo.com/wp-content/uploads/2011/04/Tyre-Illustration-500.jpg" />']

score 1 · Answer 1 · answered Apr 28 '19 at 17:51

You can use /<img .*?>/g and exec to check match like this

var str = '<p>↵   This is the cap you unscrew to open when you refuel your car↵</p>↵↵<p>↵ New line↵</p>↵↵<p>↵ <img alt="blah" src="https://www.imgone.com/wp-content/uploads/2011/04/Tyre-Illustration-500.jpg" />↵</p>Random Text <img alt="blah" src="https://www.imgtwo.com/wp-content/uploads/2011/04/Tyre-Illustration-500.jpg" />'

var m;
var result = []
do {
    m = re.exec(str);
    if (m) {
        result.push(m[0]);
    }
} while (m);
//var tmp = str.replace(/<img .*?>/g,"");
console.log(result)

var re = /<img .*?>/g;
var str = '<p>↵   This is the cap you unscrew to open when you refuel your car↵</p>↵↵<p>↵ New line↵</p>↵↵<p>↵ <img alt="blah" src="https://www.imgone.com/wp-content/uploads/2011/04/Tyre-Illustration-500.jpg" />↵</p>Random Text <img alt="blah" src="https://www.imgtwo.com/wp-content/uploads/2011/04/Tyre-Illustration-500.jpg" />'

var m;
var result = []
do {
    m = re.exec(str);
    if (m) {
        result.push(m[0]);
    }
} while (m);
//var tmp = str.replace(/<img .*?>/g,"");
console.log(result)

[Please check out this post](https://stackoverflow.com/a/1732454/5066625) — Miroslav Glamuzina, Apr 28 '19 at 17:58
Thanks a lot @Hein Nguyen, It works, now I need to take out all the p tags and put it in another array but leave the one that has tag inside of it. I am not very good at regular expression sadly. Thanks for you time — user10867452, Apr 28 '19 at 18:40

score 0 · Answer 2 · 2019-04-29T00:14:31.913

0

This is the JS regex for the img tag

<img\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>

JS demo

var str = '<p>↵   This is the cap you unscrew to open when you refuel your car↵</p>↵↵<p>↵ New line↵</p>↵↵<p>↵ <img alt="blah" src="https://www.imgone.com/wp-content/uploads/2011/04/Tyre-Illustration-500.jpg" />↵</p>Random Text <img alt="blah" src="https://www.imgtwo.com/wp-content/uploads/2011/04/Tyre-Illustration-500.jpg" />'

var result = str.match( /<img\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>/g );
console.log(result)

edited Apr 29 '19 at 00:14

answered Apr 28 '19 at 17:54

[Please check out this post](https://stackoverflow.com/a/1732454/5066625) – Miroslav Glamuzina Apr 28 '19 at 17:58
That is some impressive regex there, truly. But even if it does work (for most cases), it still should not be used when there are ways to traverse document nodes. – Miroslav Glamuzina Apr 29 '19 at 00:37
@MiroslavGlamuzina - Let me reframe what you should say: `That is a regex designed for markup tag parsing.. For any case, it works better than a html/xml parser. It should always be used when just parsing tags because it will not stop if it encounters an ill-formed segment. Instead, it matches the valid part and moves on to the next. This regex is the most intelligent tag parser there is. There is no tag that it cannot parse, and do it correctly.` – Apr 29 '19 at 14:49
@MiroslavGlamuzina - (con't) `Since it's core is stable, multiple variants can be used to find or replace any single/multiple attribute-values or any standalone values within a tag, as well as entirely redoing sections. It can also be used to correct ill-formed tags, and multiple other uses` Nobody is trying to put Dom out of business, but parsing tags is not a language dependency, nor has anything to do with balanced text. – Apr 29 '19 at 14:55

Miroslav Glamuzina · Answer 3 · 2019-04-29T00:35:05.733

You can use to document.createElement() to act as a container to hold all the HTML in the str. After setting the innerHTML with the str value, you can iterate through the children of the element you just have created, filtering out any <image/>'s.

Updated to recursively get elements

let str = '<p>This is the cap you unscrew to open when you refuel your car</p><p>New line↵</p><p> <img alt="blah" src="https://www.imgone.com/wp-content/uploads/2011/04/Tyre-Illustration-500.jpg" /></p>Random Text <img alt="blah2" src="https://www.imgtwo.com/wp-content/uploads/2011/04/Tyre-Illustration-500.jpg"/>';

// Create a container for the HTML above
let el = document.createElement('div');
// Put string in innerHTML of container 
el.innerHTML = str;

// Function to recursively filter out any images (as document Nodes) 
getTags = (el, tagName) => Array.from(el.children).reduce((acc, node) => {
  if (node.children.length) {
    acc = [...acc, ...getTags(node, tagName)];
  } else if (node.tagName === tagName.toUpperCase()) {
    acc.push(node);
  }
  return acc;
}, []);

// Result
console.log(getTags(el, 'img'));

Please do not use regex for parsing HTML, please see this post.

Take out image tag off the string and put it in an array

3 Answers3