0

I'm trying to output XML file using PHP, and everything is right except that the file that is created isn't UTF-8 encoded, it's ANSI. (I see that when I open the file an do the Save as...). I was using

$dom = new DOMDocument('1.0', 'UTF-8');

but I figured out that non-english characters don't appear on the output. I was searching for solution and I tryed first adding

header("Content-Type: application/xml; charset=utf-8");

at the beginning of the php script but it say's: Extra content at the end of the document Below is a rendering of the page up to the first error.

I've tryed some other suggestions like not to include 'UTF-8' when creating the document but to write it separately: $doc->encoding = 'UTF-8'; , but the result was the same.

I used

$doc->save("filename.xml"); 

to save the file, and I've tryed to change it to

$doc->saveXML();

but the non-english characters didn't appear. Any ideas?

Sab
  • 3
  • 2
  • Have you tried opening the saved file in a FF or chrome browser? When you get the output page what's your browser encoding type? – Javad May 05 '14 at 16:07
  • I'm confused. Are you trying to create an XML file, or return XML over http? Most of your post seems to be about creating an XML file, but then why would you be changing the response headers? Where is the code where you're actually adding these non English characters to the XML document? – JLRishe May 06 '14 at 07:37
  • This is actually my first post here (and my first "serious" application) so I apologize for bad explanation of problem. I'm trying to create an XML file, and non English characters included in data which is added from database so the problem more likely is somewhere else, not in php, as the answers below says. – Sab May 06 '14 at 12:16
  • possible duplicate of [UTF-8 all the way through](http://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – Álvaro González May 06 '14 at 18:59

1 Answers1

0

ANSI is not a real encoding. It's a word that basically means "whatever encoding my Windows computer is configured to use". Getting ANSI is a clear sign of relying on default encoding somewhere.

In order to generate valid UTF-8 output, you have to feed all XML functions with proper UTF-8 input. The most straightforward way to do it is to save your PHP source code as UTF-8 and then just type some non-English letters. If you are reading data from external sources (such as a database) you need to ensure that the complete toolchain makes proper use of encodings.

Whatever, using "Save as" in an undisclosed piece of software is not a reliable way to determine the file encoding.

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
  • You are right, when I type non-English letters as data to write to XML file, I get them as they should be. But I'm reading data from external source and it is database. I'm using phpMyAdmin, and I'm sure that I set all fields to utf8_general_ci. When I export the database (to sql file or something else) it always contains non-english characters and it looks all right. Can you tell me what else I need to check? – Sab May 06 '14 at 18:39
  • Before, I was using `new mysqli` to connect to database but I've changed it to `mysql_connect` and added `mysql_set_charset('utf8', $con);`. Now the output in XML file looks right. You helped when you mentioned that the problem is in connection with database not in creating the XML file. – Sab May 06 '14 at 19:35