4

I have code which is creating an XML, my only problem is with the encoding of words like á, olá and ção.
These characters dont appear correctly and when I try reading the XML I get an error displayed relating to that character.

$dom_doc = new DOMDocument("1.0", "utf-8");
$dom_doc->preserveWhiteSpace = false;
$dom_doc->formatOutput = true;
$element = $dom->createElement("hotels");

while ($row = mysql_fetch_assoc($result)) {

$contact = $dom_doc->createElement( "m" . $row['id'] );

$nome = $dom_doc->createElement("nome", $row['nome'] );

$data1 = $dom_doc->createElement("data1", $row['data'] );
$data2 = $dom_doc->createElement("data2", $row['data2'] );


$contact->appendChild($nome);
$contact->appendChild($data1);
$contact->appendChild($data2);

$element->appendChild($contact);
$dom_doc->appendChild($element);

What can I change to fix my problem, I am using utf-8???

  • Can you show the error's you're getting, and/or what the characters look like when you open the xml in an editor? – flup Feb 13 '13 at 14:05
  • i get parse errors and then wierd chars in their place –  Feb 13 '13 at 14:16
  • It'd be good to see the actual weird characters and the errors – flup Feb 13 '13 at 14:30
  • See http://stackoverflow.com/questions/2790027/utf-8-character-encoding-battles-json-encode I think the data fetched from the database needs to be converted to UTF-8 before you create an Element from it. – flup Feb 13 '13 at 14:34

2 Answers2

1

Please try to put directly 'á', 'olá' or 'ção' in your script.

$data1 = $dom_doc->createElement("data1", 'ção');

If you don't have problem, this is probably the data you get from mysql that are wrongly encoded. Are you sure your mysql outputs correct UTF-8?

To know that, make your PHP dump your data in an HTML document with meta tag set to UTF-8 and see if the characters display correctly.

You can also call :

$data1 = $dom_doc->createElement("data1", mb_detect_encoding($row['data']));

and see what encoding is detected by PHP for your data.

If you can't convert the data from your database, or change its settings, you can use mb_convert to do it on-the-fly : http://www.php.net/manual/en/function.mb-convert-encoding.php

ofaurax
  • 1,417
  • 1
  • 20
  • 27
0

You are using utf-8, the 8-bit unicode encoding format. Even though it properly supports all 1,112,064 code points in Unicode its possible that there is an issue here.
Try UTF-16 as the standard, just an idea. See below:

$dom_doc = new DOMDocument("1.0", "utf-16");

OR

$dom_doc = new DOMDocument("1.0", "ISO-10646");
Craig Taub
  • 4,169
  • 1
  • 19
  • 25
  • 1
    If the UTF-16 things worked, then your DB might be filled with data coming from a windows system perhaps. I heard they use UTF-16 as default... – ofaurax Feb 13 '13 at 14:29
  • The bit about UTF-8 and the number of code points is irrelevant. – flup Feb 13 '13 at 14:29
  • Although utf-16 is not really a nice encoding, better convert it to utf-8, before inserting it in the xml – BeniBela Feb 13 '13 at 14:39