1

i have this HTML code from my company site. Since I do not have access to the database, I want to parse thru a HTML file and return the values. The code is like this:

<?php
$string = '
<p> <b>HEADER INFO</b>
<table width=100% cellspacing=0>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>View Object:</b> 6600422</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>BPO:</b> G37147359-000000</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Ack Date:</b> 2012-05-28</font></td>
  </tr>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=3><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Operation(s):</b> PPS_Queue, PPS_Build, PPS_BoxAll, JPN_End</font></td>
  </tr>
</table>
</p>
<hr>
<p> <b>EXTERNAL ORDER NUMBER REFERENCE</b>
<table width=100% cellspacing=0>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>SAP Sales Order Number</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Customer P.O. Number</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Legacy Order Number</b></font></td>
  </tr>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">0310363858</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">77340892008-120413</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">89FF09378001</font></td>
  </tr>
</table>
</p>
<hr>
<p> <b>PRODUCTS FOR THIS WORK OBJECT/OPERATION(S)</b>
<table width=100% cellspacing=0>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>PL</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Product #</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Qty</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Options</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Serial #</b></font></td>
  </tr>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">3C</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">AP703B</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">1</font></td>
    <td valign=top colspan=1>&nbsp </td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">2S6219000G</font></td>
  </tr>
</table>
</p>
<hr>
<p> <b>Station Info</b>
<table width=100% cellspacing=0>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Start Station:</b> JPN_End</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Location:</b> Done</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Station:</b> </font></td>
  </tr>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Birth Date/Time:</b> 2012-05-23 14:20:32 SGT</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Power Cord:</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Voltage:</b></font></td>
  </tr>
</table>
</p>
<hr>
<p> <b>MATERIAL LIST FOR THIS WORK OBJECT/OPERATION(S)</b>
<table width=100% cellspacing=0>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Part Number</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Qty</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Description</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>BB Type</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Material Location</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Serial Number</b></font></td>
  </tr>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">AP703B@@</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">1</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">OEM Generic 1U SAS Enclosure</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">BOM</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">ASSY</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">2S6219000G</font></td>
  </tr>
</table>
</p>
 ';

 $result = parse_data($string);

extract($result);

echo $headertext.'<br />';
echo $sapSON.'<br />';
echo $custPON.'<br />';
echo $legacyON.'<br />';
echo $pl.'<br />';
echo $pn.'<br />';


function parse_data($string){
$string = str_replace('&nbsp;&nbsp;','',$string);

$xml = new DOMDocument();
@$xml->loadHTML($string);

$ret = array();

foreach($xml->getElementsByTagName('p') as $p) {
    $header = trim($p->nodeValue);
}

foreach($xml->getElementsByTagName('td') as $td) {
    $value = trim($td->nodeValue);
    if(!empty($value) && is_numeric($value{0})){
        $ret[] = $value;
    }
}

$ret = array('headertext'=>$header,
             'sapSON'=>$ret[0],
             'custPON'=>$ret[1],
             'legacyON'=>$ret[2],
             'pl'=>$ret[3],
             'pn'=>$ret[4],);

return $ret;
}
?>

Now I want to save the header "External Order Number Reference into i variable which I can call later on.

Also, the second, third and fourth column of the first row correspond to the value of the second, third and fourth column of the second row respectively. I also want to save these values to variables. So basically, I need a PHP script which will parse this HTML file and return me the following:

$header1 = "HEADER INFO";
$viewObject = "6600422";
$BPO = "G37147359-000000";
$AckDate = "2012-05-28";
$Operations = "PPS_Queue, PPS_Build, PPS_BoxAll, JPN_End";
$header2 = "EXTERNAL ORDER NUMBER REFERENCE";
$sapSON = "0310363858";
$custPON = "77340892008-120413";
$legacyON = "89FF09378001";
$header3 = "PRODUCTS FOR THIS WORK OBJECT/OPERATION(S)"
$pl = "3C";
$pn = "AP703B";
$qty = "1";
$options = "&nbsp;";
$serialNo = "2S6219000G";

ETC... Basically, I need all the table contents saved into variables because I will later save them to my database and create a report out of it and generate barcodes for some details

Thanks for the help!

FYI: I do not have access to the database so all I can do is parse thru this HTML file and save the values to variables which I can store to my database later on. Also, do note that the headers are constant, the only changing values are the numbers which are for different orders.

JudeJitsu
  • 730
  • 1
  • 10
  • 17

1 Answers1

2

Here Try this, See it in action

<?php
$string = '<p> <b>EXTERNAL ORDER NUMBER REFERENCE</b>
    <table width=100% cellspacing=0>
      <tr align=left>
        <td width=2% colspan=1><font face="verdana, arial, helvetica" size="-2">&nbsp;&nbsp;</font></td>
        <td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>SAP Sales Order Number</b></font></td>
        <td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>Customer P.O. Number</b></font></td>
        <td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>Legacy Order Number</b></font></td>
      </tr>
      <tr align=left>
        <td width=2% colspan=1><font face="verdana, arial, helvetica" size="-2">&nbsp;&nbsp;</font></td>
        <td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2">0310363858</font></td>
        <td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2">77340892008-120413</font></td>
        <td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2">89FF09378001</font></td>
  </tr>
    </table>
</p>
';

$result = parse_data($string);

extract($result);

echo $headertext.'<br />';
echo $sapSON.'<br />';
echo $custPON.'<br />';
echo $legacyON.'<br />';


function parse_data($string){
    $string = str_replace('&nbsp;&nbsp;','',$string);

    $xml = new DOMDocument();
    @$xml->loadHTML($string);

    $ret = array();

    foreach($xml->getElementsByTagName('p') as $p) {
        $header = trim($p->nodeValue);
    }

    foreach($xml->getElementsByTagName('td') as $td) {
        $value = trim($td->nodeValue);
        if(!empty($value) && is_numeric($value{0})){
            $ret[] = $value;
        }
    }

    $ret = array('headertext'=>$header,
                 'sapSON'=>$ret[0],
                 'custPON'=>$ret[1],
                 'legacyON'=>$ret[2]);

    return $ret;
}
?>

Edit version 2 (Multiple rows):

As your table is different for each iteration it becomes quite complex, but I like a challenge. Here you go, hope it helps...

<?php
$result = parse_data($string);

//Create Variables From Values
foreach($result as $key=>$value){
    foreach($value as $key_b=>$value_b){
        $$key_b = $value_b;
    }
}
/* --New Available Variables--
    $header0 = HEADER INFO
    $ViewObject = 6600422
    $BPO = G37147359-000000
    $AckDate = 2012-05-28
    $Operations = PPS_Queue, PPS_Build, PPS_BoxAll, JPN_End
    $header1 = EXTERNAL ORDER NUMBER REFERENCE
    $SAPSalesOrderNumber = 0310363858
    $CustomerPONumber = 77340892008-120413
    $LegacyOrderNumber = 89FF09378001
    $header2 = PRODUCTS FOR THIS WORK OBJECT/OPERATION(S)
    $PL = 3C
    $Product = AP703B
    $Qty = 1
    $Options =  
    $Serial = 2S6219000G
    $header3 = Station Info
    $StartStation = JPN_End
    $Location = Done
    $Station = 
    $BirthDateTime = 2012-05-23 14
    $PowerCord = 
    $Voltage = 
    $header4 = MATERIAL LIST FOR THIS WORK OBJECT/OPERATION(S)
    $PartNumber = AP703B@@
    $Description = OEM Generic 1U SAS Enclosure
    $BBType = BOM
    $MaterialLocation = ASSY
    $SerialNumber = 2S6219000G
*/

function parse_data($string){
    $string = str_replace('&nbsp;&nbsp;','',$string);
    $parts = explode('<hr>',$string);

    $html = new DOMDocument();
    $ret = array();
    $entry=0;
    foreach($parts as $part){
        @$html->loadHTML($part);
        //Get Header
        foreach($html->getElementsByTagName('p') as $p) {
            $ret[$entry]['header'.$entry] = trim($p->nodeValue);
        }
        $i=0;
        foreach($html->getElementsByTagName('td') as $td){
            $value = trim($td->nodeValue);
            if(empty($value)){
                continue;
            }
            switch($entry){
                case 0:
                    $split = explode(':',$value);
                    $ret[$entry][preg_replace('/[^a-zA-Z]/s', '', $split[0])] = trim($split[1]);
                    break;
                case 1:
                    if(!is_numeric($value{0})){
                        $ret[$entry][$i] = trim($value);
                    }else{
                        $ret[$entry][preg_replace('/[^a-zA-Z]/s', '', $ret[$entry][$i-3])] = trim($value);
                        unset($ret[$entry][$i-3]);
                    }
                    break;
                case 2:
                    if($i<=4){
                        $ret[$entry][$i] = trim($value);
                    }else{
                        $ret[$entry][preg_replace('/[^a-zA-Z]/s', '', $ret[$entry][$i-5])] = trim($value);
                        unset($ret[$entry][$i-5]);
                    }
                    break;
                case 3:
                    $split = explode(':',$value);
                    $ret[$entry][preg_replace('/[^a-zA-Z]/s', '', $split[0])] = trim($split[1]);
                    break;
                case 4:
                    if($i<=5){
                        $ret[$entry][$i] = trim($value);
                    }else{
                        $ret[$entry][preg_replace('/[^a-zA-Z]/s', '', $ret[$entry][$i-6])] = trim($value);
                        unset($ret[$entry][$i-6]);
                    }
                    break;
            }
            $i++;
        }
        $entry++;
    }
    return $ret;
}
?>
Lawrence Cherone
  • 46,049
  • 7
  • 62
  • 106
  • The only problem I see is if you have more data then you given in your example, like more then row of data or multiple `p` tags. – Lawrence Cherone May 28 '12 at 03:02
  • Which is the case on my side. This is a multiple tabled output. So i have to parse thru all the data. I'll see what I can work on. Thank you very much! With this, I can start parsing thru the html and maybe find another workaround with the few bumps ahead :D – JudeJitsu May 28 '12 at 03:06
  • You can assign each row to a sub array then when using extract the rows will be available like `$sapSON[0]` or `$sapSON[1]` – Lawrence Cherone May 28 '12 at 03:09
  • your code works great! However, it doesn't fetch all the data I need when I put in the whole html page i need to parse. Thanks for the help, if you kindly can help me troubleshoot some more... Please see my edit... Thank you! – JudeJitsu May 29 '12 at 01:11
  • Wow! Let me try this! Thanks! – JudeJitsu May 30 '12 at 02:50
  • I encountered a bigger problem, the table rows for "PRODUCTS FOR THIS WORK OBJECT/OPERATION(S)" and "MATERIAL LIST FOR THIS WORK OBJECT/OPERATION(S)" usually changes row numbers, say for one work object there is only one sub material tied to it. For others there are around 50 etc. Is there a way to do a workaround for this? – JudeJitsu Jun 05 '12 at 01:59