3

I'm trying to make a web service in PHP for an app to communicate with that will get data from a database and put it into XML format for the app. One of the columns, however, contains HTML and needs to be outputted (I think) as CDATA. I'm having trouble accomplishing this though. Please advise

<?php
mysql_connect(DB_HOST, DB_USER, DB_PASSWORD);
mysql_select_db(DB_NAME);

$sql = "SELECT post_date_gmt, post_content, post_title FROM [schema].wp_posts WHERE post_status = \"publish\" && post_type = \"post\" ORDER BY post_date_gmt DESC;";
$res = mysql_query($sql);

$xml = new XMLWriter();

$xml->openURI("php://output");
$xml->startDocument();
$xml->setIndent(true);

$xml->startElement('BlogPosts');

while ($row = mysql_fetch_assoc($res)) {

    $xml->startElement("Post");

    $xml->startElement("PostDate");
    $xml->writeRaw($row['post_date_gmt']);
    $xml->endElement();

    $xml->startElement("PostTitle");
    $xml->$writeRaw($row['post_title']);
    $xml->endElement();

    $xml->startCData("PostContent");
    $xml->writeCData($row['post_content']);
    $xml->endCData();

    $xml->endElement();

}

$xml->endElement();

header('Content-type: text/xml');
$xml->flush();

?>

Thank you very much in advance for any assistance you could offer!

hakre
  • 193,403
  • 52
  • 435
  • 836
Kirkland
  • 798
  • 1
  • 8
  • 20
  • `$xml->$writeRaw` - the second "`$`" is most likely in error? – hakre Oct 09 '14 at 12:51
  • I've gotta be honest with you, I was was originally trying to use JSON encoding since last Thursday and was having an atrocious time creating it then again when trying to parse it. I feel more comfortable with XML so I just went back to it until I can get a grasp around NSJSONSerializer and writing the code to make warning-free JSON. – Kirkland Oct 09 '14 at 15:32

3 Answers3

5

Do not use XMLWriter::writeRaw(), except if you really want to write XML fragments directly. "Raw" means that here will be no escaping from the library.

The correct way to write text into the XML document is XMLWriter::text().

$xml->startElement('PostTitle');
$xml->text('foo & bar');
$xml->endElement();

Output:

<?xml version="1.0"?>
<PostTitle>foo &amp; bar</PostTitle>

If you use XMLWriter::writeRaw() in this example the result would contain an unescaped & and be invalid XML.

CDATA sections are character nodes not unlike text nodes, but allow special characters without escaping and keep whitespaces. You always have to create the element node separately. An element node can contain multiple other nodes, even multiple CDATA sections.

XmlReader has two ways to create CDATA sections:

A single method:

$xml->startElement("PostContent");
$xml->writeCData('<b>post</b> content');
$xml->endElement();

Output:

<?xml version="1.0"?>
<PostContent><![CDATA[<b>post</b> content]]></PostContent>

Or start/end methods:

$xml->startElement("PostContent");
$xml->startCData();
$xml->text('<b>post</b> content');
$xml->text(' more content');
$xml->endCData();
$xml->endElement();

Output:

<?xml version="1.0"?>
<PostContent><![CDATA[<b>post</b> content more content]]></PostContent>
ThW
  • 19,120
  • 3
  • 22
  • 44
  • Thank you very much for your response! I've added your changes and unfortunately I'm getting an error when it gets to the $xml->text($row=['post_title']); The new code for this segment is: $xml->startElement("PostTitle"); $xml->$text($row['post_title']); $xml->endElement(); It prints the date perfectly fine using the same code, so I'm not sure what's wrong here. Could you please help me with this last bit? – Kirkland Oct 09 '14 at 17:24
  • Ok, I copied and pasted the working segment and now it works, but only sometimes. For some reason it only starts, populates, and ends the post_title element part of the time. That column in the query is always populated, so I still don't know whats going on with it. – Kirkland Oct 09 '14 at 17:42
  • 1
    `$xml->$text($row['post_title']);` has a `$` to much. It should be `$xml->text($row['post_title']);` – ThW Oct 09 '14 at 19:19
  • ThW, they need a flag for awesomeness button for you! Thank you so much for your help! I have one more question regarding this and then I think I'll be set so far as the PHP aspects of my program goes, but seeing as it's not related to the original post, I'm going to make it a new question. I would love it if you would be willing to assist me with it. You should be able to see it on my profile in a bit. – Kirkland Oct 09 '14 at 19:57
0

You can just add it to the elements you need wrapped with CDATA like this:

 $xml->writeRaw('<![CDATA['.$row['post_date_gmt'].']]>');
Ole Haugset
  • 3,709
  • 3
  • 23
  • 44
  • 1
    This can output invalid XML - `&` for example still needs to be escaped in CDATA sections. – ThW Oct 09 '14 at 11:27
  • Why would you need to escape the & character exactly? If I test this code without escaping it, it still works. – Ole Haugset Oct 09 '14 at 11:51
  • 1
    Well, in case `$row['post_date_gmt']` (which might not but could and that's the point as it could represent any variable data) contains "`]]>`" in there somewhere, this is just plainly broken. Next to that it's not really clever: Using **XMLWriter** and assuming the problem has not been solved already would render using **XMLWriter** superfluous. That is also some kind of degradation for the OP asking the question. The correct answer would have been: `$xml->writeCData($row['post_date_gmt']);` - because it wraps it already. No need to re-invent the wheel. – hakre Oct 09 '14 at 12:49
  • 1
    I was mistaken, sorry. You can not escape for CDATA. Like hakre noted `]]>` can create broken XML. DOM splits the CDATA section in this case. – ThW Oct 09 '14 at 13:35
0

The answer by ThW is overall thoughtful and the way to go. It explains well how the interface of XMLWriter in PHP is meant to be used.

Credits go to him as well for a large fraction of the work done for this differentiated answer as we discussed the question yesterday in chat.

There are some constrains with CDATA in XML however that also applies to the outlined two ways of using XMLWriter for CDATA:

The string ']]>' cannot be placed within a CDATA section, therefore, nested CDATA sections are not allowed (well-formedness constraint).

From: CDATA Section - compare 2.7 CDATA Sections

Normally XMLWriter accepts string data that is not encoded for the use. E.g. if you pass some text, it will get written properly encoded (unless the bespoken XMLWriter::writeRaw).

But if you start a CDATA section and then write text or you write CDATA directly, the string passed must not end nor cotain another CDATA section. That means, it can not contain the character sequence "]]>" as this would end the CDATA section prematurely.

So the responsibility to pass valid data to XMLWriter remains to the user of those methods.

It is normally trivial to do so (single-octets, US-ASCII based character set binary encodings and UTF-8 Unicode), here is some example code:

/**
 * prepare text for CDATA section to prevent invalid or nested CDATA
 *
 * @param $string
 *
 * @return string
 * @link http://www.w3.org/TR/REC-xml/#sec-cdata-sect
 */
function xmlwriter_prepare_cdata_text($string) {
    return str_replace(']]>', ']]]]><![CDATA[>', (string) $string);
}

And a usage example:

$xml = new XMLWriter();
$xml->openURI("php://output");
$xml->startDocument();

$xml->startElement("PostContent");
$xml->writeCDATA(xmlwriter_prepare_cdata_text('<![CDATA[Foo & Bar]]>'));
$xml->endElement();

$xml->endElement();

Exemplary output:

<?xml version="1.0"?>
<PostContent><![CDATA[<![CDATA[Foo & Bar]]]]><![CDATA[>]]></PostContent>

DOMDocument btw. does something very similar under the hood already:

$dom = new DOMDocument();
$dom->appendChild(
    $dom->createElement('PostContent')
);
$dom->documentElement->appendChild(
    $dom->createCdataSection('<![CDATA[Foo & Bar]]>')
);
$dom->save("php://output");

Output:

<?xml version="1.0"?>
<PostContent><![CDATA[<![CDATA[Foo & Bar]]]]><![CDATA[>]]></PostContent>

To technically understand why XMLWriter in PHP behaves this way, you need to know that XMLWriter is based on the libxml2 library. The extension in PHP for most of the work done passes the calls through to libxml:

PHP's xmlwriter_write_cdata delegates to libxml xmlTextWriterWriteCDATA which does the suspected sequence of xmlTextWriterStartCDATA, xmlTextWriterWriteString and xmlTextWriterEndCDATA.

xmlTextWriterWriteString is used in many routines (e.g. writing PI) but only for some text-writing cases the content parameter string is encoded:

  • Name,
  • Text and
  • Attribute.

For all others, it's passed as-is. This includes CDATA, so the data passed to XMLWriter::writeCData must match the requirements for XML CData (because that is written by that method):

  • [20] CData ::= (Char* - (Char* ']]>' Char*))

Which is technically saying: Any string not containing "]]>".

This can be easily oversighted, I myself suspected this could be a bug yesterday. And I'm not the only one, a related bug-report on PHP.net is: https://bugs.php.net/bug.php?id=44619 from years ago.

See as well What does <![CDATA[]]> in XML mean?

Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836