1

I am trying to generate an XML file in a UTF-16 encoding with PHP but there is a problem when I open the generated file. I use DOMDocument to create the file. With a UTF-8 encoding, no problem. When opening the XML file with Notepad++, it looks like this :

<?xml version="1.0" encoding="UTF-16"?>਍㰀伀䈀㸀ഀ
<CLIENT>਍    㰀䈀伀䴀㸀ഀ
  <BO>਍        㰀䄀搀洀䤀渀昀漀㸀ഀ
      <Object>2</Object>਍          㰀嘀攀爀猀椀漀渀㸀㈀㰀⼀嘀攀爀猀椀漀渀㸀ഀ
    </AdmInfo>਍        㰀䈀甀猀椀渀攀猀猀倀愀爀琀渀攀爀猀㸀ഀ
      <row>਍   

         㰀䌀愀爀搀吀礀瀀攀㸀㠀㰀⼀䌀愀爀搀吀礀瀀攀㸀ഀ

... and so on !!! Can someone help me please ?

Using Notepad++, I set encoding to UTF-8 without BOM and the file looks like that :

 <?xml version="1.0" encoding="UTF-16"?>਍㰀伀䈀㸀ഀ
  <CLIENT>਍    㰀䈀伀䴀㸀ഀ
      <BO>਍        㰀䄀搀洀䤀渀昀漀㸀ഀ
          <Object>2</Object>਍          㰀嘀攀爀猀椀漀渀㸀㈀㰀⼀嘀攀爀猀椀漀渀㸀ഀ
        </AdmInfo>਍        㰀䈀甀猀椀渀攀猀猀倀愀爀琀渀攀爀猀㸀ഀ
          <row>਍            㰀䌀愀爀搀吀礀瀀攀㸀㠀㰀⼀䌀愀爀搀吀礀瀀攀㸀ഀ
            <CardCode>01000001</CardCode>਍          㰀⼀爀漀眀㸀ഀ
        </BusinessPartners>਍      㰀⼀䈀伀㸀ഀ
    </BOM>਍  㰀⼀䌀䰀䤀䔀一吀㸀ഀ

A part of the PHP file as request :

    header('Content-Type: text/xml');
                    //header('Content-Transfer-Encoding: binary');
                    $xml = new DOMDocument();
                    $xml->version='1.0';
                    $xml->encoding='UTF-16';
                    $ob_client = $xml->createElement('OB');
                        $client_element = $xml->createElement('CLIENT');
                            $client_bom_element = $xml->createElement('BOM');
                                $client_bo_element = $xml->createElement('BO');
                                    $client_adminfo_element = $xml->createElement('AdmInfo');
                                        $client_adminfo_object_element = $xml->createElement('Object', '2');
                                        $client_adminfo_version_element = $xml->createElement('Version', '2');

                                    $client_BusinessPartners_element = $xml->createElement('BusinessPartners');
                                        $client_BusinessPartners_row_element = $xml->createElement('row');
                                            $client_BusinessPartners_row_cardtype_element = $xml->createElement('CardType', $_XML_CardType);
                                            $client_BusinessPartners_row_cardcode_element = $xml->createElement('CardCode', $_XML_CardCode);

...
$xml->formatOutput = true;                  
                    echo $xml->saveXML();
                    $xml->save('rudy-xml-particulier'.$commandeId.'.xml');

Thanks a lot.

hakre
  • 193,403
  • 52
  • 435
  • 836
Pureandfast
  • 29
  • 4
  • 7
  • Does notepad++ not have an option to view the file using another encoding? Im sure it does, but cant check on my mac. – Husman Mar 04 '13 at 12:12
  • I just updated my post but it fails. – Pureandfast Mar 04 '13 at 12:19
  • Perhaps it's because I cannot speak Chinese but I can't spot the difference between both files. Additionally, are you sure your PHP code is correct? – Álvaro González Mar 04 '13 at 12:22
  • If I set an UTF-8 encoding, there is no error ! – Pureandfast Mar 04 '13 at 12:26
  • Show us how you apply UTF-16 encoding to the document in PHP and how you send the data. – Daniel Mar 04 '13 at 12:26
  • I put a part of my PHP file. – Pureandfast Mar 04 '13 at 12:35
  • Are you sure that all the variables and constant strings you insert into the DOM are encoded as UTF-16? If the data are from DB, are your DB connection charset set to UTF-16? Do you put any 'hardcoded strings ' into the document? If so, are your PHP scripts UTF-16 encoded? – SWilk Mar 04 '13 at 13:07
  • So many questions SWilk. Thank you. The data are from a Database (PhpMyAdmin - latin1_swedish_ci) The variables and constant strings I insert are root data. for instance : $xml->createElement('DocType', $myvar). – Pureandfast Mar 04 '13 at 13:10
  • Can someone tell me the PHP functions I may use in order to solve my problem please ? – Pureandfast Mar 04 '13 at 15:52
  • The first problem you have is that the `DOMDocument` extension when you pass data into it expects the data to be UTF-8 encoded. What you describe shows that you do not pass strings that contain UTF-8 encoded data but latin1 encoded data. That is not compatible. Additionally please clarify what *"using UTF-16 encoding with PHP"* actually means to you. And you should really use DOMDocument with UTF-8, just saying. It's XMLs default and it's what DOMDocument takes as string input. So please clarify your concrete issue and improve your question. – hakre Mar 05 '13 at 12:38

1 Answers1

2

You already generate an XML file with UTF-16. All you need to do is to specify the encoding upfront which you do:

$doc = new DOMDocument();
$doc->encoding='UTF-16';

So the problem is more likely when you add data, especially element values. PHP won't give any warning nor prevent you from adding non UTF-8 byte-sequences. Here is an example that provokes that even:

$_XML_CardType = "\xA9"; # non utf-8 byte-sequence (latin-1 copyright symbol)
$xml->createElement('CardType', $_XML_CardType); # returns DOMElement

Then when you use

echo $xml->saveXML();

PHP might tell you about the problem (depending on the PHP version, error reporting settings and underlying libraries) and (for the newer PHP versions) cut off the string at the place where the error occurs. An exemplary error message is:

Warning: DOMDocument::saveXML(): output conversion failed due to conv error, bytes 0xA9 0x3C 0x2F 0x69

Therefore all you need to do is to ensure that the string data you use with createElement for the value is UTF-8 encoded. And that is already all you need to do.

As you say you fetch the data from a database, please consult the documentation of your PHP database client library how to make it returning strings in UTF-8 encoding. That should immediately solve your issue.

To ensure that you then get a string in UTF-8 encoding test it before you insert it, for example with a Regex to detect Invalid UTF-8 String:

if (!preg_match('//u', $_XML_CardType) {
    throw new Exception("Non utf-8 string deteced.");
}
$xml->createElement('CardType', $_XML_CardType);

This will throw an exception instead of inserting then. Also log/display errors and follow the error stream to spot additional problems.

Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836