Remove everything except image tag from string using regular expression

Question

I have string that contains all the html elements , i have to remove everything except images .

Currently i am using this code

$e->outertext = "<p class='images'>".str_replace(' ', ' ', str_replace('Â','',preg_replace('/#.*?(<img.+?>).*?#is', '',$e)))."</p>";

Its serving my purpose but very slow in execution . Any other way to do the same would be appreciable .

your request isn't clear. what is the input and what is the required output? — Gil Peretz, Sep 09 '15 at 12:06
How to remove everything except images using regular expression @GilPeretz — santosh, Sep 09 '15 at 12:11
Your request is to remove everything from an html document except images, but what do you mean by image? An image's tag? Image's path? Image's name? You should also provide an example in your question of a string containing the html elements and what do you expect as a result out of that string. — Pedro Pinheiro, Sep 09 '15 at 13:03
Image tag should be kept and other thing should be removed @PedroPinheiro — santosh, Sep 09 '15 at 13:14

score 0 · Accepted Answer · edited May 23 '17 at 12:14

The code you provided seems to not work as it should and even the regex is malformed. You should remove the initial slash / like this: #.*?(<img.+?>).*?#is.

Your mindset is to remove everything and leave just the image tags, this is not a good way to do it. A better way is to think in just capturing all image tags and then using the matches to construct the output. First let's capture the image tags. That can be done using this regex:

/<img.*>/Ug

The U flag makes the regex engine become lazy instead of eager, so it will match the encounter of the first > it finds.

DEMO1

Now in order to construct the output let's use the method preg_match_all and put the results in a string. That can be done using the following code:

<?php
// defining the input
$e = 
'<div class="topbar-links"><div class="gravatar-wrapper-24">
<img src="https://www.gravatar.com/avatar" alt="" width="24" height="24"     class="avatar-me js-avatar-me">
</div>
</div> <img test2> <img test3> <img test4>';
// defining the regex
$re = "/<img.*>/U";
// put all matches into $matches
preg_match_all($re, $e, $matches);
// start creating the result
$result = "<p class='images'>";
// loop to get all the images
for($i=0; $i<count($matches[0]); $i++) {
    $result .= $matches[0][$i];
}
// print the final result
echo $result."</p>";

DEMO2

A further way to improve that code is to use functional programming (array_reduce for example). But I'll leave that as a homework.

Note: There is another way to accomplish this which is parsing the html document and using XPath to find the elements. Check out this answer for more information.

Thanks @pedro for the explanation . – santosh Sep 11 '15 at 05:44 — santosh, Sep 11 '15 at 05:44

Remove everything except image tag from string using regular expression

1 Answers1