4

I'm creating an RSS feed for a site.

I am using SimpleXML to create the XML structure. When I call $xml->asXML();, it throws many warnings:

ErrorException [ Warning ]: SimpleXMLElement::asXML() [simplexmlelement.asxml]: string is not in UTF-8

I'm not sure what this error is. The database table it is reading from is utf8_general_ci. I tried running utf_encode on the string which messed up the strings instead of fixing it.

//First create the XML root
$xml = new SimpleXMLElement('<rss version="2.0"></rss>');

//Create the Channel
$channel = $xml->addChild('channel');

//Construct the feed parameters
$channel->addChild('title', 'CGarchitect Feed');
$channel->addChild('link', Config::get('base_url'));
$channel->addChild('description', 'CGarchitect is the leading online community for architectural visualization professionals.');
$channel->addChild('pubDate', date("D, d M Y H:i:s T"));

//Get the feed items

$nodes = <....snip... >

foreach ($nodes as $node)
{

    //Parse the title and description
    $title = htmlentities(strip_tags($node->title));
    $description = htmlentities(strip_tags($node->description));
    $newItem = $channel->addChild('item');
    $newItem->addChild('title', $title);
    $newItem->addChild('description', $description);
    $newItem->addChild('pubDate', date("D, d M Y H:i:s T", $node->published_at));

}

header('Content-Type: application/xhtml+xml');
echo $xml->asXML();

Thanks in advance...

Leonard

Leonard Teo
  • 1,238
  • 1
  • 17
  • 28
  • Did you set the [MySql connection encoding](http://dev.mysql.com/doc/refman/5.5/en/charset-connection.html) to UTF8 as well? – Jon Jan 09 '12 at 23:47
  • @Jon Yes. mysql_client_encoding() returns 'utf8' – Leonard Teo Jan 09 '12 at 23:52
  • Are you sure, that you are using a UTF-8 connection to the database? Do this query at the very first time, after you established the connection: mysql_query("SET NAMES 'utf8'"); – Michael Walter Jan 09 '12 at 23:54
  • I added the above code with the same result. As mentioned, I ran mysql_client_encoding() and it returns utf8. – Leonard Teo Jan 09 '12 at 23:58
  • 1
    atxba got a good hint for you. the problem is, that htmlentities() is not working in utf-8 in standard mode. use it like this: htmlentities ($string,ENT_NOQUOTES, 'UTF-8'); the standard is ISO-8859-1. So you have to change it. "ENT_NOQUOTES" means, that no quotes will be replaced. for other values, check the manual [htmlentitie()](http://php.net/manual/en//function.htmlentities.php) – Michael Walter Jan 10 '12 at 00:22
  • re: utf8_general_ci http://stackoverflow.com/a/1036459/183677 ... "very broken", heh. if you posted the actual string that is failing, you could check it against some utilty, e.g. http://hexutf8.com/?q=c2a9981a800. I'd guess that MySQL is storing some malformed UTF8 bytes and SimpleXMLElement is not liking it. – jar Sep 08 '16 at 16:31

1 Answers1

2

I was able to reproduce your problem replacing your $nodes ... snippet with

class myNode {

    public $title="(╯°□°)╯︵ ┻━┻";
    public $description="dscr";
    public $published_at=0;

    public function __construct(){
        $this->published_at=time();
    }

}

$nodes = array(new myNode());

Simply removing the calls to htmlentities seemed to work fine. (The output was properly escaped as character entities)

atxdba
  • 5,158
  • 5
  • 24
  • 30
  • Ok this works. I was running htmlentities because I had some entries in the database that had the & character without it being &.... – Leonard Teo Jan 10 '12 at 00:22
  • 1
    Or alternatively you could specify the charset for htmlentities like htmlentities(strip_tags($node->title), ENT_COMPAT,'utf-8'); That works for me too – atxdba Jan 10 '12 at 00:49
  • Thanks. For the record, I was unable to get it working with htmlentities as I kept getting Entity 'lsquo' not defined. I switched to using htmlspecialchars and it worked... – Leonard Teo Jan 10 '12 at 12:01