0

After hours of searching, I can't find a solution for saving a file in a forced UTF-8 encoding. If there is any character in a string which is only available in UTF-8, the file is successfully saved as a UTF-8, but if there are characters which are available in ASCII and UTF-8, the file is saved as ASCII

file_put_contents("test1.xml", "test"); // Saved as ASCII
file_put_contents("test2.xml", "test&"); // Saved as ASCII
file_put_contents("test3.xml", "tëst&"); // Saved as UTF-8

I can add a BOM to force a UTF-8 file, but the receiver of the document does not accept a BOM:

 file_put_contents("utf8-force.xml", "\xEF\xBB\xBFtest&"); // Stored as UTF-8 because of the BOM

I did check the encoding with a simple code:

exec('file -I '.$file, $output);
print_r($output);

Since the character & is a single byte in ASCII and a two-byte character is UTF-8, the receiver of the file can't read the file. Is there a solution to force a file to UTF-8 without a BOM in PHP?

Stefan
  • 249
  • 5
  • 17
  • I don't know, but did you try http://php.net/manual/en/function.mb-convert-encoding.php – AbraCadaver Jan 04 '18 at 16:53
  • That's not how UTF8 works, and an "ASCII" file is byte-for-byte identical to a UTF8 file if you're only using codepoints under 127. UTF8 files categorically do not need BOMs, and your receiver is the problem in this situation. – Sammitch Jan 04 '18 at 19:17
  • The receiver was indeed the problem since they would like to have always a UTF-8 file. So, the solution was in this particular case that I've added a character which doesn't exist in ASCII (ë, é etc.) to an attribute of the XML. – Stefan Jan 15 '18 at 05:57

1 Answers1

-2

file_put_contents will not convert encoding You have to convert the string explicitly with mb_convert_encoding

try this :

$data = 'test';
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
file_put_contents("test1.xml", $data); 

or you can try using stream_filer

$data = 'test';
$file = fopen('test.xml', 'r');
stream_filter_append($file, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($file, fopen($data, 'w'));
azjezz
  • 3,827
  • 1
  • 14
  • 35
  • Unfortunately, that is not a solution since ASCII is a valid subset of UTF-8. I did try that already. – Stefan Jan 04 '18 at 17:00
  • Yes, I did try that already from the post https://stackoverflow.com/questions/4839402/how-to-write-file-in-utf-8-format – Stefan Jan 04 '18 at 17:06
  • I have done research for hours! And unfortunately, that is not the solution. – Stefan Jan 04 '18 at 17:14