0

I am creating a wordpress function and need to determine whether an image in the content is wrapped with an a tag that contains a link to a PDF or DOC file e.g.

<a href="www.site.com/document.pdf"><img src="../images/image.jpg" /></a>

How would I go about doing this with PHP?

Thanks

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
badcoder
  • 3,624
  • 5
  • 32
  • 33

2 Answers2

2

I would very strongly advise against using a regular expression for this. Besides being more error prone and less readable, it also does not give you the ability to manipulate the content easily.

You would be better of loading the content into a DomDocument, retrieving all <img> elements and validating whether or not their parents are <a> elements. All you would have to do then is validate whether or not the value of the href attribute ends with the desired extension.

A very crude implementation would look a bit like this:

<?php

$sHtml = <<<HTML
<html>
<body>
    <img src="../images/image.jpg" />
    <a href="www.site.com/document.pdf"><img src="../images/image.jpg" /></a>
    <a href="www.site.com/document.txt"><img src="../images/image.jpg" /></a>
    <p>this is some text <a href="site.com/doc.pdf"> more text</p> 
</body>
</html>
HTML;

$oDoc = new DOMDocument();
$oDoc->loadHTML($sHtml);
$oNodeList = $oDoc->getElementsByTagName('img');

foreach($oNodeList as $t_oNode)
{
    if($t_oNode->parentNode->nodeName === 'a')
    {
        $sLinkValue = $t_oNode->parentNode->getAttribute('href');
        $sExtension = substr($sLinkValue, strrpos($sLinkValue, '.'));

        echo '<li>I am wrapped in an anchor tag '
           . 'and I link to  a ' . $sExtension . ' file '
        ; 
    }
}
?>

I'll leave an exact implementation as an exercise for the reader ;-)

Potherca
  • 13,207
  • 5
  • 76
  • 94
0

Here is a DOM parse based code that you can use:

$html = <<< EOF
<a href="www.site.com/document.pdf"><img src="../images/image.jpg" /></a>
<img src="../images/image1.jpg" />
<a href="www.site.com/document.txt"><IMG src="../images/image2.jpg" /></a>
<a href="www.site.com/document.doc"><img src="../images/image3.jpg" /></a>
<a href="www.site.com/document1.pdf">My PDF</a>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$nodeList = $doc->getElementsByTagName('a');
for($i=0; $i < $nodeList->length; $i++) {
    $node = $nodeList->item($i);
    $children = $node->childNodes; 
    $hasImage = false;
    foreach ($children as $child) { 
       if ($child->nodeName == 'img') {
          $hasImage = true;
          break;
       }
    }
    if (!$hasImage)
       continue;
    if ($node->hasAttributes())
       foreach ($node->attributes as $attr) {
          $name = $attr->nodeName;
          $value = $attr->nodeValue;
          if ($attr->nodeName == 'href' && 
              preg_match('/\.(doc|pdf)$/i', $attr->nodeValue)) {
                echo $attr->nodeValue . 
                     " - Image is wrapped in a link to a PDF or DOC file\n";
                break;
          }

       }
}

Live Demo: http://ideone.com/dwJNAj

anubhava
  • 761,203
  • 64
  • 569
  • 643