0

I want to extract table from html page which contains nested html table tags after that I want to extract <td> and <tr>of tables.

I am using this. Its working fine for <b> and </b>

$file = file_get_contents($url);
preg_match_all ("/<b>(.*)<\/b>/U", $file, $pat_array);
print $pat_array[0][0]." <br> ".$pat_array[0][1]."\n";

Can anybody tell me regular expression for nested <table (some table properties)> some data using <tr> and <td> </table>. Please keep the href if present in the <tr> or <td> fields, and keep in mind the needed tables.

Example:

$file = "<html> <head> <title> asdf </title> </head> <body bgcolor = red >  <table border = 1> <table bgcolor = white> (some tr and td data > </table> </table></body> </body> </html>"

preg_match_all ("regular expression for table tag", $file, $pat_array);
print $pat_array[0][0]." <br> ".$pat_array[0][1]."\n";

Update 1 :

When I tried below code it shows the error:

Notice: Undefined offset: 0 in C:\xampp\htdocs\testphp\tabledata.php on line 27

Code:

$file = file_get_contents($url);
$pat_array = Array();
preg_match_all ("/<tr>(.*)<\/tr>/U", $file, $pat_array);
print $pat_array[1][0];

Can anybody help me regarding this error also?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
santosh
  • 343
  • 7
  • 18

1 Answers1

1

Don't try to parse HTML with regex, use DOMDocument and DOMXpath instead.

$dom = new DOMDocument();
$dom->loadHtml($file);

$xpath = new DOMXpath($dom);
$tableNodes = $xpath->query('//table'); // select all table nodes

// do something, e.g. print node content
foreach ($tableNodes as $tableNode) {
    print $tableNode->nodeValue;
}

There are a lot more query options which you can perform with xpath, have a look here. Also you propably want to do something else with the selected nodes than just printing the content. If you are looking for the sub DOM of each table, try this:

foreach ($tableNodes as $tableNode) {
    $newDom = new DOMDocument();
    $clone = $tableNode->cloneNode(true);
    $clone = $newDom->importNode($clone, true);
    $newDom->appendChild($clone);

    $html = $newDom->saveHTML();
}
SBH
  • 1,787
  • 3
  • 22
  • 27
  • how to use this cod, is it to install any other files or by default these facilities are there in php, if need to install extra packages then how to install and i am using shared hosting and iis 7 server so tell me a proper solutions – santosh Nov 17 '14 at 10:34
  • @santosh what errors do you get? The approach is correct, you maybe have invalid HTML which will result in errors in `$dom->loadHtml(...)` – SBH Nov 17 '14 at 12:30
  • ya you are right simple html it works but when i select random any working website's html page then it shows loadHtml function calling problem – santosh Nov 17 '14 at 13:56