2

I have created this php script, to create an xml-file from my database:

<?php
header("Content-type: text/xml");
header("Expires: Mon, 26 Jul 1990 05:00:00 GMT");
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header("Cache-Control: no-store, no-cache, must-revalidate");
header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");

$host = "localhost";
$user = "root";
$pass = "root";
$database = "flexapp";

$charToReplace = array("é", "è", "ê");
$charReplacements = array("e", "e", "e");

$linkID = mysql_connect($host, $user, $pass) or die("Could not connect to host.");
mysql_select_db($database, $linkID) or die("Could not find database.");

$query = "SELECT * FROM artists";
$artist_result = mysql_query($query, $linkID) or die("Data not found.");

$xml_output = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";
$xml_output .= "<artists>\n";

for($x = 0 ; $x < mysql_num_rows($artist_result) ; $x++){
    $itemrow = mysql_fetch_assoc($artist_result);
    $xml_output .= "\t<artist>\n";
    $xml_output .= "\t\t<id>" . $itemrow['pk_artist_id'] . "</id>\n";
    $itemrow['artist_name'] = str_replace($charToReplace, $charReplacements,         $itemrow['artist_name']);
    $xml_output .= "\t\t<name>" . $itemrow['artist_name'] . "</name>\n";
    $xml_output .= "\t\t<picture>" . $itemrow['artist_pic'] . "</picture>\n";
    $xml_output .= "\t\t<twitter>" . $itemrow['artist_twitter'] . "</twitter>\n";
    $xml_output .= "\t</artist>\n";
}
$xml_output .= "</artists>\n";
echo $xml_output;
?>

I try to replace characters like é en ê to e.

But it gives this error in the browser:

This page contains the following errors:

error on line 23 at column 9: Encoding error

This is the xml that is outputted:

<?xml version="1.0" encoding="UTF-8"?>
<artists>
<artist>
    <id>1155</id>
    <name>Have Heart</name>
    <picture>http://userserve-ak.last.fm/serve/126/29086375.jpg</picture>
    <twitter></twitter>
</artist>
<artist>
    <id>1156</id>
    <name>Dead Swans</name>
    <picture>http://userserve-ak.last.fm/serve/126/4781939.jpg</picture>
    <twitter></twitter>
</artist>
<artist>
    <id>1157</id>
    <name>Nirvana</name>
    <picture>http://userserve-ak.last.fm/serve/126/3991355.jpg</picture>
    <twitter></twitter>
</artist>
<artist>
    <id>1158</id>
    <name>Touchter>
</artist>

But the last one (the name in the database is Touché Amoré) should be like all the rest, but somehow the strings aren't being replaced.

Veltar
  • 741
  • 2
  • 14
  • 30
  • Not sure if it's exactly what you're looking for, but try [this](http://stackoverflow.com/questions/1890854/how-to-replace-special-characters-with-the-ones-theyre-based-on-in-php) maybe it will help – mseancole Dec 09 '11 at 15:46
  • One thing I've fallen over with doing something similar is the MySQL connector - by default, even if the engine is storing the string in utf-8, the connector pulls it out in iso-8859-1 unless you set the charset to the connector. See mysql_set_charset() @ http://php.net/manual/en/function.mysql-set-charset.php If you've got the XML file in utf-8 and the database is in utf-8 you _shouldn't_ need to convert the 'é' type characters to 'e' – CD001 Dec 09 '11 at 16:35
  • Tried this, but it doesn't help. – Veltar Dec 09 '11 at 16:44
  • expanded on it as a full answer - feel free to ignore if you've sorted it though ;) – CD001 Dec 09 '11 at 16:47

4 Answers4

3

This may be an issue with encoding the data before printing. Try using utf8_encode() and htmlentities()

function xmlencode($data) {
    $data = utf8_encode($data);
    $data = htmlentities($data);
    return $data;
}

and then

...
$xml_output .= "\t\t<picture>" . xmlencode($itemrow['artist_pic']) . "</picture>\n";
...
dchrastil
  • 582
  • 3
  • 5
  • ` <![CDATA[ Touché Amoré ]]> ` Close, it creates the full xml-file this time, but as you can see, still not complete ok. – Veltar Dec 09 '11 at 16:38
  • 1
    By removing the line `$data = htmlentities($data);` it works, thank you! – Veltar Dec 09 '11 at 16:46
  • 1
    great, glad that helped :) So you used it in combination with the CDATA[] from rifat? That should do it – dchrastil Dec 09 '11 at 16:53
  • yeah, don't use htmlentities(); HTML entities !== XML entities; technically in standard XML there are only 5 entities. Conveniently, they're the same as used by htmlspecialchars() so that's a better function to use. There's a slight proviso with utf8_encode() as well - it **only** converts from iso-8859-1 to utf-8 (I've fallen over that with the € symbol and Windows-1252 before). – CD001 Dec 09 '11 at 16:56
  • I faced a similar problem and i solved it by running the below query before outputting the xml data: **mysql_query("SET NAMES 'utf8'");** – shasi kanth Jan 05 '12 at 08:55
0

Why don't you just wrap name variable with CDATA like

header('Content-Type: text/xml, charset=utf-8');

<name><![CDATA[Some very wired name]]></name>

And, If I were you then I will use SimpleXML to generate the XML.

Rifat
  • 7,628
  • 4
  • 32
  • 46
0

I am guessing the characters are saved as entities, which is why it will not replace and will 'destroy' your XML output.

Your scripts works like a charm over here and outputs as expected. Will you check your database via phpmyadmin if the saved value contains é instead of é? In that case you should decode the entities first to the characters they represent: http://php.net/manual/en/function.html-entity-decode.php

Dennis Jamin
  • 398
  • 1
  • 10
0

If your database is in utf-8 and your XML file is in utf-8; you shouldn't need to do any conversion for "foreign" characters like é.

First off, ensure your database tables are stored in utf-8 and then fix your XML file to utf-8...

header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");
header("Content-type:text/xml;charset=utf-8"); //  <- add this line

Next - make sure that your MySQL connector is retrieving the utf-8 data from the database in utf-8 (it doesn't by default).

$linkID = mysql_connect($host, $user, $pass) or die("Could not connect to host.");
mysql_select_db($database, $linkID) or die("Could not find database.");
mysql_set_charset("UTF8"); //  <-- add this line; NOTE MySQL doesn't use the hyphen in the utf-8 string

In theory now, all your é, ê and ñ (and so on) characters should appear properly inside the XML document without any conversions or CDATA parsing required.

The only characters you'll need to convert are, IIRC, the same as those converted by htmlspecialchars namely <, >, & and (optionally) quotes and apostrophes.

CD001
  • 8,332
  • 3
  • 24
  • 28