6

how can I extract information from a HTML file by using DOMDocument in PHP

my HTML page has a source with this part inside

this is my third table in the page that I need to work on:

 <table>
 <tbody>
 <tr>
   <td>A</td>
   <td>B</td>
   <td>C</td>
   <td>D</td>
</tr>
<tr>
  <td>1</td>
  <td>2</td>
  <td>3</td>
  <td>4</td>
</tr>
</tbody>
</table>

If my use ask me for showing row with B and D how should I extract the first row of this table and print it by using DOMDocument?

MrCode
  • 63,975
  • 10
  • 90
  • 112
femchi
  • 1,185
  • 8
  • 20
  • 37
  • Do you just want the first row? or do you want the row that contains `B` and `D` and can they be anywhere in the row or specific columns? – MrCode Jul 12 '13 at 10:47
  • they could be anywhere inside the third table. there are around 30 rows in this table. – femchi Jul 12 '13 at 10:52
  • So you want to get the row that contains `B` and `D` in the second and fourth columns? What if `B` is in the first column and `D` in the second? – MrCode Jul 12 '13 at 10:55
  • B and D stay in the same place in all rows. – femchi Jul 12 '13 at 11:00
  • Are you aware that PHP has an HTML parser? Recommended reading: [How do you parse and process HTML/XML in PHP?](http://stackoverflow.com/q/3577641/367456) – hakre Jul 13 '13 at 07:58
  • No Sir. Im an amateur ;) – femchi Jul 18 '13 at 05:36

2 Answers2

20

This would do it, it simply grabs the third table, loops over the rows and checks for B and D in the second and fourth columns. If found, it prints out each column value then stops looping.

$dom = new DOMDocument();
$dom->loadHTML(.....);

// get the third table
$thirdTable = $dom->getElementsByTagName('table')->item(2);

// iterate over each row in the table
foreach($thirdTable->getElementsByTagName('tr') as $tr)
{
    $tds = $tr->getElementsByTagName('td'); // get the columns in this row
    if($tds->length >= 4)
    {
        // check if B and D are found in column 2 and 4
        if(trim($tds->item(1)->nodeValue) == 'B' && trim($tds->item(3)->nodeValue) == 'D')
        {
            // found B and D in the second and fourth columns
            // echo out each column value
            echo $tds->item(0)->nodeValue; // A
            echo $tds->item(1)->nodeValue; // B
            echo $tds->item(2)->nodeValue; // C
            echo $tds->item(3)->nodeValue; // D
            break; // don't check any further rows
        }
    }
}
MrCode
  • 63,975
  • 10
  • 90
  • 112
  • please review: https://stackoverflow.com/questions/47123769/trying-to-get-td-content-using-domdocument-without-any-success – Imnotapotato Nov 05 '17 at 19:07
0

This code is tested by me enjoy it

$table = "<table>
 <tbody>
 <tr>
   <td>A</td>
   <td>B</td>
   <td>C</td>
   <td>D</td>
</tr>
<tr>
  <td>1</td>
  <td>2</td>
  <td>3</td>
  <td>4</td>
</tr>
</tbody>
</table>";
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8"?>' . $table);
$rows =$doc->getElementsByTagName('tr');
$tds= $doc->getElementsByTagName('td');
$ths= $doc->getElementsByTagName('th');
foreach ($ths as $th) {
echo "<p> th  = ".$th." </p>";
}
foreach ($tds as $td) {
echo "<p> td  = ".$td." </p>";
}
Dave
  • 5,108
  • 16
  • 30
  • 40
elaz
  • 121
  • 1
  • 5