3

I want to read whatever is inside the <q:content></q:content> tags in the following xml -

$xml = '<?xml version="1.0"?>
                    <q:response xmlns:q="http://api-url">
                        <q:impression>
                            <q:content>
                                <html>
                                    <head>
                                        <meta name="HandheldFriendly" content="True">
                                        <meta name="viewport" content="width=device-width, user-scalable=no">
                                        <meta http-equiv="cleartype" content="on">
                                    </head>
                                    <body style="margin:0px;padding:0px;">
                                        <iframe scrolling="no" src="http://some-url" width="320px" height="50px" style="border:none;"></iframe>
                                    </body>
                                </html>
                            </q:content>
                            <q:cpc>0.02</q:cpc>
                        </q:impression>
                    ...
                        ... some more things
                    ...
                    </q:response>';

I have put the xml in the variable above and then I use SimpleXMLElement::getNamespaces as given in the section "Example #1 Get document namespaces in use" -

//code continued
$dom = new DOMDocument;
 // load the XML string defined above
$dom->loadXML($xml);

var_dump($dom->getElementsByTagNameNS('http://api-url', '*') ); // shows object(DOMNodeList)#3 (0) { } 


foreach ($dom->getElementsByTagNameNS('http://api-url', '*') as $element) 
{
    //this does not execute
    echo 'see - local name: ', $element->localName, ', prefix: ', $element->prefix, "\n";
}

But the code inside the for loop does not execute.

I have read these questions -

Update
Also tried this solution Parse XML with Namespace using SimpleXML -

$xml = new SimpleXMLElement($xml);
$xml->registerXPathNamespace('e', 'http://api-url');

foreach($xml->xpath('//e:q') as $event) {
    echo "not coming here";
    $event->registerXPathNamespace('e', 'http://api-url');
    var_export($event->xpath('//e:content'));
}

In this case too, the code inside the foreach does not execute. Not sure if I wrote everything correct ...

Further Update
Going with the first solution ... with error_reporting = -1, found that the problem is with the URL in the src attr of the iframe tag. Getting warnings like -

Warning: DOMDocument::loadXML(): EntityRef: expecting ';' in Entity, line: 13

Updated code -

$xml = '<?xml version="1.0"?>
                    <q:response xmlns:q="http://api-url">
                        <q:impression>
                            <q:content>
                                <html>
                                    <head>
                                        <meta name="HandheldFriendly" content="True" />
                                        <meta name="viewport" content="width=device-width, user-scalable=no" />
                                        <meta http-equiv="cleartype" content="on" />
                                    </head>
                                    <body style="margin:0px;padding:0px;">
                                        <iframe scrolling="no" src="http://serve.qriously.com/v1/request?type=SERVE&aid=ratingtest&at=2&uid=0000000000000000&noHash=true&testmode=true&ua=Mozilla/5.0 (Linux; U; Android 2.2.1; en-us; Nexus One Build/FRG83) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1&appid=12e2561f048158249e30000012e256826ad&pv=2&rf=2&src=admarvel&type=get&lang=eng" width="320px" height="50px" style="border:none;"></iframe>
                                    </body>
                                </html>
                            </q:content>
                            <q:cpc>0.02</q:cpc>
                        </q:impression>
                        <q:app_stats>
                                <q:total><q:ctr>0.023809523809523808</q:ctr><q:ecpm>0.5952380952380952</q:ecpm></q:total>
                                <q:today><q:ctr>0.043478260869565216</q:ctr><q:ecpm>1.0869565217391306</q:ecpm></q:today>
                        </q:app_stats>
                    </q:response>';
Community
  • 1
  • 1
Sandeepan Nath
  • 9,966
  • 17
  • 86
  • 144
  • @hakre - no the foreach does not execute – Sandeepan Nath Jul 12 '11 at 14:05
  • The codeblock inside the foreach does not execute because the DOMNodeList is empty. The foreach does execute, but as there are not elements to iterate over, the code block inside is skipped. I suggest you put it into a variable of it's own first to make it easier to debug. – hakre Jul 12 '11 at 14:07
  • @hakre, yes I meant the same... but wrote it wrong – Sandeepan Nath Jul 12 '11 at 14:08
  • No problem, just wanted to make that clear. The function works as it should, but you're not creating the document properly, see my answer. – hakre Jul 12 '11 at 14:12
  • Where do you get that XML from? You write it your own? – hakre Jul 12 '11 at 14:18
  • Its the response I am getting from qriously API (http://www.qriously.com/) – Sandeepan Nath Jul 12 '11 at 14:22
  • You need to urlencode the iframe source. $iframe_src = urlencode($big_nasty_url_string); Then concat it to your iframe src in the XML. – Aaron Ray Jul 12 '11 at 15:00
  • @Aaron Ray, yes you are correct. Now the question is how do I do grab that URL when I get it as a response from the API and urlencode() it? In order to grab that I need to load the XML and that is where I am getting stuck. You see the loop? :) – Sandeepan Nath Jul 12 '11 at 15:07
  • @Sandeepan Nath: Well if the actual question is answered, I suggest you do a new question for the new problem. Otherwise things get mixed and are harder to resolve. – hakre Jul 12 '11 at 15:21

1 Answers1

4

I have no problem to get it to work, the only error I could find is that you're loading XML containing a non-XML HTML chunk in there which is breaking the document: The meta elements in the head section are not closed.

See Demo.

Tip: Always activate error logging and reporting, check for warnings and notices if you develop and debug code. A short one-line displaying all sort of PHP error messages incl. warnings, notices and strict:

error_reporting(-1); ini_set('display_errors', 1);

DOMDocument is talkative then about malformed elements when loading XML.

Fixing the XML "on the fly"

DomDocument accepts only valid XML. If you've got HTML you can alternatively try if DOMDocument::loadHTML() does the job as well, however it will convert the loaded string into a X(HT)ML document then. Probably not what you're looking for.

To escape a specific part of the string to load to make it XML compatible you can search for string patterns to obtain the substring that represents the HTML inside the XML and properly XML encode it.

E.g. you can look for <html> and </html> as the surrounding tags, extract the substring of the whole and replace it with substr_replace(). To encode the HTML for being used as data inside the XML, use the htmlspecialchars() function, it will replace everything with the five entities in the other SO answer.

Some mock-up code:

$htmlStart = strpos($xml, '<html>');
if (false === $htmlStart) throw new Exception('<html> not found.');
$htmlEnd = strpos($xml, '</html>', $htmlStart);
if (false === $htmlStart) throw new Exception('</html> not found.');
$htmlLen = $htmlEnd - $htmlStart + 7;
$htmlString = substr($xml, $htmlStart, $htmlLen);
$htmlEscaped = htmlspecialchars($htmlString, ENT_QUOTES);
$xml = substr_replace($xml, $htmlEscaped, $htmlStart, $htmlLen);
Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836
  • I think you are technically supposed to escape some of those HTML characters. See this article: http://stackoverflow.com/questions/1091945/where-can-i-get-a-list-of-the-xml-document-escape-characters – Aaron Ray Jul 12 '11 at 14:15
  • @Aaron Ray: Not in case it's actual X(HT)ML. – hakre Jul 12 '11 at 14:17
  • @hakre, I too am able to run the code here http://codepad.org/XVEx5Ay4. I have closed the meta tags and getting no more warning. Not getting why is it not running here on my system. I have also set `ini_set("error_reporting", "E_ALL");` just before that part (to make sure). I am not getting any errors displayed. – Sandeepan Nath Jul 12 '11 at 14:23
  • @hakre - no :(, check previous comment here – Sandeepan Nath Jul 12 '11 at 14:26
  • 1
    Try `error_reporting(-1); ini_set('display_errors', 1);` Then double-check your XML is correct. – hakre Jul 12 '11 at 14:29
  • To be precise, the warnings start appearing when I add the 2nd parameter in the URL in the `src` attr of the iframe. I guess some escaping of & is needed as pointed out by @Aaron Ray. If that is to be done, do I need to selectively escape for the list of characters listed in http://stackoverflow.com/questions/1091945/where-can-i-get-a-list-of-the-xml-document-escape-characters/1091953#1091953 ? How do I do that? – Sandeepan Nath Jul 12 '11 at 15:02
  • By the way why does error_reporting("E_ALL"); not display all errors? am I missing something? – Sandeepan Nath Jul 12 '11 at 15:15
  • 1
    @Sandeepan Nath: Updated the answer. And about `E_ALL` - no it does not. It perhaps will again for PHP 5.4 or higher, but right now the constant name is totally misleading. That's why i normally suggest -1 which is the workaround. – hakre Jul 12 '11 at 15:20
  • @hakre please check my new question http://stackoverflow.com/q/6679281/351903 with further queries about the solution part – Sandeepan Nath Jul 13 '11 at 13:17