1

From gateway I get one very unsual result it's HTML inside XML, which confuses me. When I echo variable $result this is the output:

<Results>
    <XML_Report>
       <Subject>
         <EFX_Code>199</EFX_Code>
         <Referral>SPECIAL_WOHA</Referral>
       </Subject>
    </XML_Report>
<HTML_Report>
<![CDATA[
        <html>
        <head>


        </head>
        <body>



        <a name="mergereport" />

        <p>MERGE REPORT</p>

        <table border="1" WIDTH="100%" cellpadding=0 cellspacing=0>
        <tr><td class=heading colspan=4 align="center" bgcolor="#c0c0c0"><p class=heading>Personal Information Since 08/09/09 FAD 04/17/12</p></td></tr>
        <tr><td><br /></td><td><br /></td><td width="15%" align=center><p><b>Reported</b></p></td><td align=center><p><b>Bur</b></p></td></tr>
        <tr>
        <td width="15%" valign=top align=right><p class=pipad><b>
        Name<br />
        SSN<br />
        Inquiry SSN<br />
        DOB<br />
        Address
        </b></p></td>
        </tr></table>
        </body>

        </html>
]]>
 </HTML_Report>
</Results>

How can I parse that variable to extract out only part of HTML I want eg. anything withing tags inside with PHP... I've browsed a lot but can't find any proper answer if such parsing is possible and more important HOW?

ProDraz
  • 1,283
  • 6
  • 22
  • 43

4 Answers4

2
$doc = new DOMDocument();
$doc->loadHTML($your_html);

Then read up on how to use the DOM library.

Anthony
  • 36,459
  • 25
  • 97
  • 163
0

In an ideal world, the XML_Report would be for scripts like your PHP to read, and the HTML_Report would only be for human display. That doesn't, however, appear to be the case from the sample you posted.

You have two parsing tasks here.

First, parse the XML. Navigate within it (via XPath or DOM functions) to the CDATA contents of the HTML_Report element.

Now, the second task: parse the HTML, just as if you'd received it as a raw string.

If what you're asking is "how do I parse HTML using PHP?" there are around 1.874 billion answers on this very site.

Borealid
  • 95,191
  • 9
  • 106
  • 122
-1
$html = substr($xml, strpos($xml, '<html>'), 
               strpos($xml, '</html>') - strpos($xml, '<html>') + 7);
Jack
  • 5,680
  • 10
  • 49
  • 74
-2

A quick and dirty solution:

//Assumes the contents of the xml file are in a string called $xml
$arr = explode("<HTML_Report>", $xml);
if(count($arr) > 1)
{
    $arr2 = explode("</HTML_Report>", $arr[1]);
    $html_portion = $arr2[0];
}

Summary: split the xml string at the HTML_Report start and end tags, each time keeping only the element of the resulting array containing the HTML portion. This will result in $html_portion also containing the CDATA wrapper so if you want to avoid that then split on "".

It ain't elegant but it gets the job done.

EDIT: Fixed code from $xml[1] to $arr[1] - thanks Marc B.

TheOx
  • 2,208
  • 25
  • 28
  • using `$xml[1]` would simply be the 2nd char of the entire xml document, since presumably $xml is just a php string... – Marc B Apr 20 '12 at 02:58
  • @MarcB you're right - typo, supposed to be $arr[1] not $xml[1] – TheOx Apr 20 '12 at 03:04
  • @TheOx Guessing, but it's probably because `` could occur within the body of another `` tag, so your code isn't actually correct... I personally recommend using a parser to parse structured languages, instead of hacking string manipulations. – Borealid Apr 20 '12 at 15:20
  • @Borealid - I see what you're saying, although I answered on the assumption that the XML format was pretty much set with what the user posted. But you're right - a parser is generally a more stable and flexible solution. – TheOx Apr 20 '12 at 17:17