0

i'm trying to extract specific link attributes from within a multiple nested table structure. The document format is old, which would explain the rampant use of table element to design the page.

Here is the relevant document which i'm trying to parse using DOMXPath:: Each table with a width of 100% has the same level of nested childs, i.e. tbody, tr, td, a, div, etc.

<table width="1000px">
    <tbody>
        <tr></tr>
        <tr>
            <td>
                <br>
                <span></span>
                <span></span>
                <div></div>
                <div>
                    <div></div>
                    <div>
                        <center></center>
                        <hr>
                        <table width="100%"></table>
                        <table width="100%">
                            <tbody>
                                <tr>
                                    <td>
                                        <a name="A"></a>
                                        <div style="width: 230px;">
                                            <a href="owlbook/manufacturer.aspx?manufacturerId=124">Owl Chant Book</a>
                                            <br>
                                        </div>
                                    </td>
                                </tr>
                            </tbody>
                        </table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                        <table width="100%"></table>
                    </div>
                </div>
            </td>
        </tr>
    </tbody>
</table>

And here is the code which i'm using to parse it. I'm trying to get the value of the href and the textValue of the anchor nested deep within the divs.

public function parseManufacturerNodes($results) {
    error_reporting(0);
    $this->dom = new DOMDocument();
    $this->dom->loadHTML($results);

    $this->domQuery = new DOMXPath($this->dom);
    $this->nodes = $this->domQuery->query("//table/tbody/tr/td/div/div/div/div/table/tbody/tr/td/div");
    var_dump($this->nodes);
    foreach ($this->nodes as $this->eachNodes) {
        echo $this->eachNodes;
    }
    error_reporting(1);

}

This doesn't works at all. I've tried changing the query parameters to match the document structure without any avail. var_dump returns.

object(DOMNodeList)#44 (1) { ["length"]=> int(0) }

How would i extract the anchor attributes from each of the divs within the inner table which has a width of 100%. Which in this case would return href="owlbook/manufacturer.aspx?manufacturerId=124" and textValue = Owl Chant Book

Please provide any sort of help, as i don't think i'm making any progress in finding a viable solution.

Thanks, Maxx

Maxx
  • 592
  • 18
  • 42
  • On first sight your code seems fine. Where have you got that input from? It has horribly broken markup, did you remove its contents an break it? And: Is that HTML from Firebug or what you get from the webserver directly? Firebug changes the HTML. – Jens Erat Oct 07 '13 at 09:41
  • Yes i removed the bottom ending table tag, body tag, form tags etc so that its easy to post on stackoverflow. I should have mentioned that using a note in my question, but i don't know how to do it. And i did copied it from Firebug, you know. But the input to the code was the full page, which has the markup intact – Maxx Oct 07 '13 at 11:03
  • Just for the sake of posting it in here, i ommitted the ending tags down toward the bottom of the post – Maxx Oct 07 '13 at 11:05
  • If you've copied it from Firebug, have a look at this -- I think the rest of your markup is fine. I couldn't test it though, as the markup is too broken. Please make sure to post valid markup, or HTML/XML parsers will refuse it. http://stackoverflow.com/questions/18241029/why-does-my-xpath-query-scraping-html-tables-only-work-in-firebug-but-not-the – Jens Erat Oct 07 '13 at 11:07
  • Alright, i have edited the markup. Please take a look. Each table which has a width of 100% has the same level of child elements. I have just expanded the 2nd of these, to keep the post clean and readable – Maxx Oct 07 '13 at 11:56
  • Have you checked the linked question / are the tbody's really occuring in the original HTML? Because everything else looks fine on first sight. – Jens Erat Oct 07 '13 at 14:11
  • Yes i have visited the above link, and the tbody really does occurs in the original html sources. – Maxx Oct 07 '13 at 14:40
  • Any help anyone? I'm stuck with this endlessly for several days now... Nothing seems to work. – Maxx Oct 11 '13 at 03:11

0 Answers0