I have a simple table:
<table>
<tr>
<td>T1 Row 1 Col 1</td>
<td>T1 Row 1 Col 2 <IMG SRC="someimage.png" TITLE="sometitle" /></td>
<td>T1 <a href="somelink.htm">Row</a> 1 Col 3</td>
</tr>
<tr>
<td><div class="someclass">T1 Row 2 Col 1</div></td>
<td>T1 Row 2 Col 2</td>
<td>T1 Row 2 Col 3</td>
</tr>
</table>
I need to parse it into a PHP Array so that:
$arr[0][0][0]; //would equal "T1 Row 1 Col 1"
$arr[0][0][1]; //would equal "T1 Row 1 Col 2 <IMG SRC="someimage.png" TITLE="sometitle" />"
$arr[0][0][2]; //would equal "T1 <a href="somelink.htm">Row</a> 1 Col 3"
$arr[0][1][0]; //would equal "<div class="someclass">T1 Row 2 Col 1</div>"
I have tried the DOM way:
$dom = new DOMDocument;
$html = $dom->loadHTML($HTMLTable);
$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
$cols = $rows->item(0)->getElementsByTagName('th');
$row_headers = NULL;
foreach ($cols as $node) {
$row_headers[] = $node->innerHTML;
}
$table = array();
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row) {
$cols = $row->getElementsByTagName('td');
$row = array();
$i=0;
foreach ($cols as $node) {
# code...
if($row_headers==NULL)
$row[] = $node->nodeValue;
else
$row[$row_headers[$i]] = $node->innerHTML;
$i++;
}
$table[] = $row;
}
But it appears there is no way to extract content of a TD cell verbatim with the HTML intact. It always only returns the text, ignoring any images or div code content. I've tried several things like nodeValue, textContent, plaintext, innerHTML, etc. I'm probably not seeing the obvious so any advice would be much appreciated.