0

There is the simple PHP script which parses XML document and show attribute of item (attribute is Russian, and XML file uses "utf-8" charset):

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<?php
    //header('Content-Type: text/html; charset=utf-8');
    $xml=simplexml_load_file('output.xml');
    echo $xml['moves'];
?>
</body>
</html>

My XML:

<?xml version="1.0" encoding="UTF-8"?>
<game moves="Папа"> 
<a attr="2">123</a>
</game> 

Using this code I see only "Папа instead of "Папа" russian text. But if I delete all HTML and set charset through header() PHP method it'll work correctly! How can I fix it?

hakre
  • 193,403
  • 52
  • 435
  • 836
user2078683
  • 71
  • 1
  • 7
  • Do you have try with utf8_decode()? – Sam Mar 09 '13 at 12:39
  • Are you 1000% sure the XML is UTF-8 encoded? Which encoding is the browser showing you in the "encoding" menu when you visit the page you quote above? – Pekka Mar 09 '13 at 12:44
  • I've updated with my XML code; I'm sure that document is correctly encoded with "UTF-8"; utf8_decode() doesn't help me. – user2078683 Mar 09 '13 at 12:54
  • please show the response headers when you request that file and it looks broken. – hakre Mar 09 '13 at 13:22

2 Answers2

0

When authoring document is HTML or XHTML, it is important to Add a Doctype declaration. It may be solve your problem

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
Ahmed Atta
  • 363
  • 1
  • 7
0

You should always double check if you're unsure. Lets do that.

First check if the XML file is actually UTF-8 encoded.

And secondly check finally that the HTML you generate is actually UTF-8 encoded.

here is you example from above with these two checks:

<?php
ob_start();
?>
    <html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    </head>
    <body>
    <?php
    $buffer = file_get_contents('output.xml');
    if (!preg_match('//u', $buffer)) {
        throw new Exception("XML file is not UTF-8 encoded!");
    }

    $xml = simplexml_load_string($buffer);
    echo $xml['moves'];
    ?>
    </body>
    </html>
<?php
$buffer = ob_get_clean();
if (!preg_match('//u', $buffer)) {
    throw new Exception("HTML is not UTF-8 encoded!");
}
?>
Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836