1

I have to extract this particular HTML using PHP. Since I haven't any class or unique ID I tried to use his bgcolor attrib but without success...

<td bgcolor="#F5EC97" width="154" valign="top" align="left" height="55">
             <font face="Verdana, Arial, Helvetica, sans-serif" size="1"><b><font color="#CC6633">CITY</font></b><br>
              <b>xyz</b><br>
              xyz<br>
              Tel. 555/22327<br>
              &nbsp;

    </td>

this is the code I've tried:

$res = $html->find('td[bgcolor=#F5EC97]');

Any suggestion?

Rahul
  • 18,271
  • 7
  • 41
  • 60
cesko80
  • 147
  • 1
  • 1
  • 9
  • *(related)* [Best methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Oct 19 '10 at 10:48

2 Answers2

1

Parse into a DOMDocument:

$doc= new DOMDocument();
$doc->loadHTML($html);

Then pick the element(s), either with plain DOM getElementsByTagName:

foreach ($doc->getElementsByTagName('td') as $td) {
    if ($td->getAttribute('bgcolor')=='#F5EC97') {
        // do something with $td
    }
}

Or with XPath:

$xpath= new DOMXpath($doc);
foreach ($xpath->query("//td[@bgcolor='#F5EC97']") as $td) {
   // do something with $td
}
bobince
  • 528,062
  • 107
  • 651
  • 834
0

finally got it...

It does work also with simple_html_dom, just use always lowercase in html color code ex: #f5ec97. NOT working using uppercase, even if in the original document color code is uppercase.

<?php

    require_once("simple_html_dom.php");

    $html = file_get_html('pharma/w_43.htm');
    foreach($html->find('td[bgcolor=#f5ec97]') as $article){
        echo $article->innertext; 

    }

?>

cesko80
  • 147
  • 1
  • 1
  • 9
  • Oh! So it's simple_html_dom... I *did* wonder where you got `find()` from. This seems like a bug to me. – bobince Oct 19 '10 at 14:56