I am able to parse HTML page properly, but it is parsing just the data whereas I want to fetch entire HTML code inside in <tr> , <td>
. Below is my PHP code:
<?php
$dom = new DOMDocument();
//load the html
$html = $dom->loadHTMLFile("hydrocarbon.htm");
//discard white space
//$dom->preserveWhiteSpace = false;
//the table by its tag name
$tables = $dom->getElementsByTagName('table');
//get all rows from the table
$rows = $tables->item(0)->getElementsByTagName('tr');
// get each column by tag name
$cols = $rows->item(0)->getElementsByTagName('th');
$row_headers = NULL;
foreach ($cols as $node) {
//print $node->nodeValue."\n";
$row_headers[] = $node->nodeValue;
}
$table = array();
//get all rows from the table
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row)
{
// get each column by tag name
$cols = $row->getElementsByTagName('td');
$row = array();
$i=0;
foreach ($cols as $node) {
# code...
//print $node->nodeValue."\n";
if($row_headers==NULL)
$row[] = $node->nodeValue;
else
$row[$row_headers[$i]] = $node->nodeValue;
$i++;
}
$table[] = $row;
}
//var_dump($table);
print("<pre>".print_r($table,true)."</pre>");
?>
This is my result:
and this is my HTML code:
<table>
<thead>
<tr><th>Column 1</th><th>Column 2</th><th>Column 3</th></tr>
</thead>
<tbody>
<tr> <td><b>Q</b></td><td>Desc.</td> </tr>
<tr> <td>Type</td><td>Multiple choice</td> </tr>
<tr><td>Option</td><td>image #####2</td><td>incorrect</td></tr>
<tr><td>Option</td><td>image #####2</td><td>incorrect</td></tr>
<tr><td>Option</td><td>image #####2</td><td>incorrect</td></tr>
<tr><td>Option</td><td>image #####2</td><td>incorrect</td></tr>
<tr><td>Solution</td><td>Some text / image</td></tr>
<tr><td>Marks</td><td>4</td><td>1</td></tr>
</tbody>
</table>
It is parsing Q
and not <b>Q</b>
. How can I achieve this?
Edit 1: Original table where your solution should work
<table class=MsoNormalTable border=1 cellspacing=0 cellpadding=0 width=610 style='width:457.25pt;margin-left:10.8pt;background:#CED7E7;border-collapse:
collapse;border:none'>
<tr style='height:30.35pt'>
<td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:30.35pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>Question<span style='border:none'> </span></span>
</span>
</p>
</td>
<td width=498 colspan=2 valign=top style='width:373.25pt;border:solid black 1.0pt;
border-left:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
height:30.35pt'>
<p class=MsoNormal style='margin-top:0cm;margin-right:-48.45pt;margin-bottom:
0cm;margin-left:18.0pt;margin-bottom:.0001pt;line-height:115%;border:none'><b><span
lang=EN-US style='font-family:"Garamond","serif";border:none'><span
style='border:none'>Consider the following reaction,</span></span></b>
</p>
<p class=MsoNormal style='margin-top:0cm;margin-right:-48.45pt;margin-bottom:
0cm;margin-left:18.0pt;margin-bottom:.0001pt;line-height:115%'><b><span
lang=EN-US style='font-family:"Garamond","serif";border:none'><span
style='border:none'>H</span></span></b><b><sub><span lang=EN-US
style='font-family:"Garamond","serif";border:none'><span style='border:none'>3</span></span></sub></b><b><span
lang=EN-US style='font-family:"Garamond","serif";border:none'><span
style='border:none'>C – CH – CH – CH</span></span></b><b><sub><span
lang=EN-US style='font-family:"Garamond","serif";border:none'><span
style='border:none'>3</span></span></sub></b><b><span lang=EN-US
style='font-family:"Garamond","serif";border:none'><span style='border:none'>
+ </span></span></b><b><span lang=EN-US style='font-family:"Garamond","serif";
position:relative;top:2.0pt;border:none'><img width=26 height=29
src="hydrocarbon2_files/image001.png"></span></b><b><span lang=EN-US
style='font-family:"Garamond","serif";border:none'><span style='border:none'> →
‘X’ + HBr </span></span></b>
</p>
<p class=MsoNormal style='margin-top:0cm;margin-right:-48.45pt;margin-bottom:
0cm;margin-left:18.0pt;margin-bottom:.0001pt;line-height:115%'><b><span
lang=EN-US style='font-family:"Garamond","serif";border:none'><span
style='border:none'> | |</span></span></b>
</p>
<p class=MsoNormal style='margin-top:0cm;margin-right:-48.45pt;margin-bottom:
0cm;margin-left:18.0pt;margin-bottom:.0001pt;line-height:115%'><b><span
lang=EN-US style='font-family:"Garamond","serif";border:none'><span
style='border:none'> D CH</span></span></b><b><sub><span
lang=EN-US style='font-family:"Garamond","serif";border:none'><span
style='border:none'>3</span></span></sub></b>
</p>
<p class=MsoNoSpacing style='margin-top:0cm;margin-right:-48.45pt;margin-bottom:
0cm;margin-left:.3pt;margin-bottom:.0001pt;text-align:justify;text-indent:
-.3pt'><b><span lang=EN-GB style='font-size:16.0pt;font-family:"Chaparral Pro","serif"'> </span></b>
</p>
</td>
</tr>
<tr style='height:15.0pt'>
<td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>Type</span></span>
</p>
</td>
<td width=498 colspan=2 valign=top style='width:373.25pt;border-top:none;
border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>multiple_choice</span></span>
</p>
</td>
</tr>
<tr style='height:15.0pt'>
<td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>Option</span></span>
</p>
</td>
<td width=219 valign=top style='width:164.25pt;border-top:none;border-left:
none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
<p class=BodyA><span style='font-size:16.0pt;color:black;border:none'><img
width=205 height=93 src="hydrocarbon2_files/image002.jpg"></span>
</p>
</td>
<td width=279 valign=top style='width:209.0pt;border-top:none;border-left:
none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>I</span></span><span lang=EN-US style='font-size:16.0pt;
border:none'><span style='border:none'>n<span style='border:none'>correct</span></span>
</span>
</p>
</td>
</tr>
<tr style='height:15.0pt'>
<td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>Option</span></span>
</p>
</td>
<td width=219 valign=top style='width:164.25pt;border-top:none;border-left:
none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
<p class=BodyA><span style='font-size:16.0pt;border:none'><img width=205
height=102 id="Picture 13" src="hydrocarbon2_files/image003.jpg"></span>
</p>
</td>
<td width=279 valign=top style='width:209.0pt;border-top:none;border-left:
none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>C</span></span><span lang=EN-US style='font-size:16.0pt;
border:none'><span style='border:none'>orrect</span></span>
</p>
</td>
</tr>
<tr style='height:15.0pt'>
<td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>Option</span></span>
</p>
</td>
<td width=219 valign=top style='width:164.25pt;border-top:none;border-left:
none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
<p class=BodyA><span style='font-size:16.0pt;border:none'><img width=205
height=107 id="Picture 16" src="hydrocarbon2_files/image004.jpg"></span>
</p>
</td>
<td width=279 valign=top style='width:209.0pt;border-top:none;border-left:
none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>Incorrect</span></span>
</p>
</td>
</tr>
<tr style='height:15.0pt'>
<td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>Option</span></span>
</p>
</td>
<td width=219 valign=top style='width:164.25pt;border-top:none;border-left:
none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
<p class=BodyA><span style='font-size:16.0pt;border:none'><img width=205
height=112 id="Picture 19" src="hydrocarbon2_files/image005.jpg"></span>
</p>
</td>
<td width=279 valign=top style='width:209.0pt;border-top:none;border-left:
none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>Incorrect</span></span>
</p>
</td>
</tr>
<tr style='height:15.0pt'>
<td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>Solution</span></span>
</p>
</td>
<td width=498 colspan=2 valign=top style='width:373.25pt;border-top:none;
border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
<p class=MsoNormal style='margin-left:27.0pt;text-align:justify;text-indent:
-27.0pt;line-height:115%'><span style='font-family:"Garamond","serif";
border:none'><img width=398 height=92 id="Picture 10"
src="hydrocarbon2_files/image006.jpg"></span>
</p>
</td>
</tr>
<tr style='height:15.0pt'>
<td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>Marks</span></span>
</p>
</td>
<td width=219 valign=top style='width:164.25pt;border-top:none;border-left:
none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>4</span></span>
</p>
</td>
<td width=279 valign=top style='width:209.0pt;border-top:none;border-left:
none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
<p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
style='border:none'>1</span></span>
</p>
</td>
</tr>
</table>