0

I want to find and do some operations on this string:

<img src="images/video.png" border="0" alt="60" />
  1. Find every occurence in context
  2. Retrieve the alt attribute (in this case 60) and work with this number
  3. Replace the whole image

I've been playing around with regular expressions but it obviously doesn't work yet:

    if (preg_match_all('<img src="images/video.png" border="0" alt="[^"]*">', $content, $regs)) {
    for($i=0;$i<count($regs[0]);$i++){

        echo $regs[0][$i] . "<br>";
        $id = preg_replace('alt="[^"]*"', "$1", $regs[0][$i]);
        echo "The id: " . $id . "<br>";

    }
}
  • 2
    Using the PHP [DOMDocument](http://www.php.net/manual/en/class.domdocument.php) and [DOMXPath](http://www.php.net/manual/en/class.domxpath.php) classes is a much preferable and better engineered way of doing this. Would you be willing to abandon the regex approach? – Jon Jul 19 '11 at 09:22

6 Answers6

2

How about parsing the DOM using PHP Simple HTML DOM Parser

You can download the script from here: http://sourceforge.net/projects/simplehtmldom/files/

If you load that script in to your current script like this:

include_once("simple_html_dom.php");

And then you can loop through all images in your HTML and do what you want with them:

$html = "Your HTML code";

foreach($html->find('img') as $element) {

    // Do something with the alt text
    $alt_text = $element->alt;

    // Replace the image
    $element->src = 'new_src';
    $element->alt = 'new_alt';

}

Without using a library:

// Load the HTML
$html = "Your HTML code";
$dom = new DOMDocument();
$dom->loadHTML($html);

// Loop through all images
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {

  // Do something with the alt
  $alt = $image->getAttribute('alt');

  // Replace the image
  $image->setAttribute("src", "new_src");
  $image->setAttribute("alt", "new_alt");

}

// Get the new HTML string
$html = $dom->saveHTML();
betamax
  • 13,431
  • 9
  • 38
  • 55
1

You should use DOM to parse XML/HTML...

Alex Ackerman
  • 1,341
  • 12
  • 14
1

Regex isn't the recommended way to do this since malformed html is notoriously hard to regex accurately. You want to look into DOMDocument: http://php.net/manual/en/class.domdocument.php

Other alternatives are discussed here:

Robust and Mature HTML Parser for PHP

Community
  • 1
  • 1
Edgar Velasquez Lim
  • 2,426
  • 18
  • 15
0
php > $xml = new SimpleXmlElement('<img src="images/video.png" border="0" alt="60" />');
php > foreach($xml->xpath('//@alt') as $alt) echo "Id is: ",(string)$alt,"\n";
Id is: 60
yankee
  • 38,872
  • 15
  • 103
  • 162
0

[Expanding my comment into an answer]

Here's some sample code to get you started on PHP's DOM library:

$html = '...';
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

// Build the XPath query (you can specify very complex criteria here)
$images = $xpath->query('//img[@src="images/video.png" and @border="0"]');

foreach($images as $image) {
    echo 'This image has alt = '.
         $image->attributes->getNamedItem('alt')->nodeValue.
         '<br />';
}

You can look at an XPath tutorial if you want to customize the query with more advanced logic.

Jon
  • 428,835
  • 81
  • 738
  • 806
0

You should use this regex

<img src="images/video.png" border="0" alt="([^"]*)" />

But if you want to admit this input too

<img alt="60" src="images/video.png" border="0" />

and any other possible permutation, then it's better to match the image tag on its own, and then match the alt attribute on its contents.

aercolino
  • 2,193
  • 1
  • 22
  • 20