1

I have extracted a string value from my sql table and it is like below:

<p>Commodity Exchange on 5 April 2016 settled as following graph:</p> 
<p><img alt=\"\" src=\"ckeditor/plugins/imageuploader/uploads/986dfdea.png\" 
style=\"height:163px; width:650px\" /></p></p> 
<p>end of string</p>

I wish to get image name 986dfdea.png inside the html tag (because there's a lot of <p></p> tags inside the string, and I want to able to know that this tag contains image), and replace the whole tag content by a symbol, like '#image1'.

Eventually it would become this:

<p>Commodity Exchange on 5 April 2016 settled as following graph:</p> 
#image1 
<p>end of string</p>

I'm developing API for mobile apps, but having baby skill on PHP, still can't achieve my goal by referring to these references:

PHP/regex: How to get the string value of HTML tag?

How to extract img src, title and alt from html using php?

Please help.

Community
  • 1
  • 1
felixwcf
  • 2,078
  • 1
  • 28
  • 45
  • 1
    Why do you need `986dfdea.png` if you don't use it after ? you just need to replace `

    \"\"

    ` for `#image1` , right ? is there really `` or just one ?
    – Pedro Lobito May 13 '16 at 15:25
  • No I don't need it. I have the image name from db, so eventually to replace the #image with the image name inside my mobile apps. I have multiple tags which contains the images. But I will put increment on the symbol string, like #image2. As long as I can achieve the above result, I can kinda sort it out by using for loop . – felixwcf May 13 '16 at 15:40
  • So, `#986dfdea.png` would be good ? Which php version are you using ? – Pedro Lobito May 13 '16 at 15:48
  • The image name is randomly generated. Suggest to find 'imageuploader', or just '.png', because in my website, the article is done by using open-source word editor. PHP version is 5.4. – felixwcf May 13 '16 at 15:56

1 Answers1

3

Yes, you could use a regex and you'd need way less code, but we shouldn't parse html with a regex, so here's what you need:

  1. Your string contains invalid html (</p></p>), so we use tidy_repair_string to clean it.
  2. Use DOMXpath() to query for p tags with img tags inside
  3. Remove any extra "and get the image filename with getAttribute("src") and basename
  4. Create a new createTextNode with the value of image #imagename
  5. Use replaceChild to replace the p with image inside with new createTextNode created above.
  6. Cleanup the !DOCTYPE, html and body tags automatically generated by new DOMDocument();

<?php
$html = <<< EOF
<p>Commodity Exchange on 5 April 2016 settled as following graph:</p>
<p><img alt=\"\" src=\"ckeditor/plugins/imageuploader/uploads/986dfdea.png\"
style=\"height:163px; width:650px\" /></p></p>
<p>end of string</p>
EOF;



$html = tidy_repair_string($html,array(
                           'output-html'   => true,
                           'wrap'           => 80,
                           'show-body-only' => true,
                           'clean' => true,
                           'input-encoding' => 'utf8',
                           'output-encoding' => 'utf8',
                                          ));


$dom = new DOMDocument();
$dom->loadHtml($html);



$x = new DOMXpath($dom);
foreach($x->query('//p/img') as $pImg){
    //get image name
    $imgFileName = basename(str_replace('"', "", $pImg->getAttribute("src")));
    $replace = $dom->createTextNode("#$imgFileName");
    $pImg->parentNode->replaceChild($replace, $pImg);
    # loadHTML causes a !DOCTYPE tag to be added, so remove it:
    $dom->removeChild($dom->firstChild);
    # it also wraps the code in <html><body></body></html>, so remove that:
    $dom->replaceChild($dom->firstChild->firstChild, $dom->firstChild);
    echo str_replace(array("<body>", "</body>"), "", $dom->saveHTML());

}

Output:

<p>Commodity Exchange on 5 April 2016 settled as following graph:</p>
<p>#986dfdea.png</p>
<p>end of string</p>

Ideone Demo

Community
  • 1
  • 1
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
  • It works. This is clean and beautiful. I've learned so much from your codes.. I've saved a lot of my hairs from it too. Thank you so much for your help. – felixwcf May 14 '16 at 15:00