1

Backstory First off, I have used PHP to read email and extract out the HTML attachment which later on I stored in a PHP variable.

Now the problem I am facing is that I am trying to extract out information from the HTML in a nested table and hoping to convert to some sort of array so that I can store in SQL.

May I know if there are solutions for this? As I have tried finding but to no avail.

Example

<html>
    <table>
        <tr>
            <td>
                <table>
                    <tr>
                        <td></td>
                        </td>
                    </tr>
                </table>
            </td>
        </tr>
        <tr>
            <td>
                <table>
                    <tr>
                        <td>
                            <p>hi</p>
                        </td>
                    </tr>
                </table>
            </td>
        </tr>
    </table>
</html>

I want to locate the nearest table tag where "hi" is located so that I can get all the information from that table

Stuff I have tried

I have tried using simple HTML DOM but I guess the HTML file that I tried scraping was too large that it causes memory issues.

include('./simple_html_dom.php');

/* Parse the HTML, stored as a string in $woString */ <br>
$html = str_get_html($worksOrder); 

/* Locate the ultimate grandfather table id that wraps all his generation */<br>
$messytables = $html->find('table');

print_r($messytables);
Nigel Ren
  • 56,122
  • 11
  • 43
  • 55
Shreamy
  • 341
  • 2
  • 16

1 Answers1

1

Rather than using simple HTML DOM, this uses DOMDocument and XPath to find the elements.

This draws on the answer XPath to find nearest ancestor element that contains an element that has an attribute with a certain value to locate the <table> tags that enclose the <p> tags which have hi in it. As there are a few levels of enclosing <table> tags, it then uses last() (from XSLT getting last element) to find the innermost enclosing <table>...

libxml_use_internal_errors(true);
$doc = new DOMDOcument();
$doc->loadHTML( $worksOrder );
$xp = new DOMXPath($doc);

$table = $xp->query('(//ancestor::table[descendant::p="hi"])[last()]');

echo $doc->saveHTML($table[0]);

The last line is just to display the data, you can just start with $table[0] and fetch the data as needed.

This outputs with your test data...

<table><tr>
<td>
                            <p>hi</p>
                        </td>
                    </tr></table>
Shreamy
  • 341
  • 2
  • 16
Nigel Ren
  • 56,122
  • 11
  • 43
  • 55