1

I am looking for an easy and efficient way to remove a specific image from an article. All that I know is the image URL of the image that I need to remove.

  • The image may or may not use different attributes.
  • The image may or may not exist at all in the article.
  • There might be other images (not same url) in the article.

My choice would be either regex or DOMDocument, probably using an HTML5 parser like https://github.com/Masterminds/html5-php.

My regex skills are not that good, and I'm not sure if it's a good idea to use regex to accomplish this because I read that regex should be avoided to parse HTML. What I have with so far with regex, is to remove the complete image, but not sure how to remove it based on a specific src url.

$img_src = 'http://www.example.org/image_to_be_removed.jpg';

$article = '<h1>Test article with HTML5 tags</h1>
<nav><a href="/link1/">Link 1</a></nav>
<p>This is an example article. The article may or may not include html5 tags, images and other things.</p>
<img src="http://www.example.org/image_to_be_removed.jpg">
<p>More example text.</p>';

$article = preg_replace("/<img[^>]+\>/i", "", $article);
echo $article;

I haven't dug into the DOMDocument solution yet, because I am not sure if it's even possible or if regex might be considered best practice?

Kevin M
  • 1,202
  • 15
  • 30
  • Why not use javascript? – Robert May 20 '18 at 02:48
  • I can't use Javascript because it's a sever side script in WordPress. Before the article is being added to WordPress, I need to double check if the article itself contains the featured image that was set, and if so, remove it from the article. – Kevin M May 20 '18 at 02:52
  • Even if regex is good, my regex skills are not good enough to search by src URL of an image. – Kevin M May 20 '18 at 02:53
  • Well if it's a specific image use a simple str_replace, that way you avoid removing other images you may not want to remove. – Robert May 20 '18 at 03:01
  • @ Robert Rocha Do you have an example? How can I use str_replace to find an image if I have only the URL and I don't know if the image inside the tag uses attributes, styles or classes. If I use str_replace on just the URL, then I'll end up with the image tag intact but no src. – Kevin M May 20 '18 at 03:05
  • Do you want the entire img tag removed or just the url to the image? – Joseph_J May 20 '18 at 03:08
  • let me see if I can come up with something based on the criteria you listed above – Robert May 20 '18 at 03:08
  • I believe he wants the entire image tag removed, no? – Robert May 20 '18 at 03:08
  • Yes, I would like the entire image tag removed. – Kevin M May 20 '18 at 03:12
  • is this all the "Article" code. The img tag has no parent – Robert May 20 '18 at 03:23

4 Answers4

3

use preg_quote:

$article = preg_replace("/<img[^>]+src=\"" . preg_quote($img_src, '/') . "\"[^>]*\>/i", "", $article);

Regex Demo

php Demo

Matt.G
  • 3,586
  • 2
  • 10
  • 23
  • Looked at your Demo at regex101.com and tried your logic with +alt instead of +src. This worked perfectly for my problem! Learned something new today. ]+alt="Picture"[^>]*\> – Karel Jul 22 '21 at 14:43
0

You can try this. It seems to test ok. At any rate it should give you an idea as to how to proceed.

$img_src = 'http://www.example.org/image_to_be_removed.jpg';

$article = '<h1>Test article with HTML5 tags</h1>
<nav><a href="/link1/">Link 1</a></nav>
<p>This is an example article. The article may or may not include html5 tags, images and other things.</p>
<img style="width:100px;" src="http://www.example.org/image_to_be_removed.jpg" class="myClass">
<p>More example text.</p>';

$article = preg_replace('/\s{1,}/', ' ', $article);  //Very important step to make sure only 1 space exist between any character.
$img_src = preg_replace('/\//', '\\/', $img_src); //Adds slashes to the url.
$regex = '/<img[\W\D\w]{0,}src=\"' . $img_src . '\"[\W\D\w]{0,}>\s/'; //Define the regex.
$article = preg_replace($regex, '', $article);
echo $article;
Joseph_J
  • 3,654
  • 2
  • 13
  • 22
0

You can try below with str_replace

<?php
$img_src = 'http://www.example.org/image_to_be_removed.jpg';

$article = '<h1>Test article with HTML5 tags</h1>
<nav><a href="/link1/">Link 1</a></nav>
<p>This is an example article. The article may or may not include html5 tags, images and other things.</p>
<img src="http://www.example.org/image_to_be_removed.jpg">
<p>More example text.</p>';
$new = str_replace('src="http://www.example.org/image_to_be_removed.jpg"','',$article);
echo $article;
echo '<br/>';
echo $new;
?>

there is both preg_replace from your code and str_replace,to notice deference. There are other function to do the same like sprintf,strtr,str_replace and preg_replace you can use whichever suites

0

It is not recommended to parse html with regex.

As you suggested, you might for example use DOMDocument or for example PHP Simple HTML DOM Parser.

Because you state that "All that I know is the image URL of the image that I need to remove", you could find the src attribute of the img tag using xpath or looking for the tag name and check that.

Example DOMDocument:

$img_src = 'http://www.example.org/image_to_be_removed.jpg';
$article = '<h1>Test article with HTML5 tags</h1>
<nav><a href="/link1/">Link 1</a></nav>
<p>This is an example article. The article may or may not include html5 tags, images and other things.</p><img src="http://www.example.org/image_to_be_removed.jpg"><img src="http://www.example.org/image_not_to_be_removed.jpg"><p>More example text.</p>\';
<p>More example text.</p>';
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($article);
$xpath = new DOMXPath($dom);
$elements = $xpath->query("//img");
foreach ($elements as $elememnt) {
    if ($elememnt->getAttribute("src") === $img_src) {
        $elememnt->parentNode->removeChild($elememnt);
    }
}
echo $dom->saveHTML();

Example PHP Simple HTML DOM Parser using simple_html_dom.php:

$htmlDom = str_get_html($article);
foreach($htmlDom ->find('img[src=http://www.example.org/image_to_be_removed.jpg]') as $item) {
    $item->outertext = '';
}
$htmlDom->save();
echo $htmlDom;
The fourth bird
  • 154,723
  • 16
  • 55
  • 70