1

P.S.: It is not a duplicated question, because I'm not looking to write contents in a file because it is already done, I'm looking to change a type of a file to be UTF-8, there is a difference in it.

How to generate the UTF-8 file and not ANSI. (Is not the contents).

For example, the most IDE have an option encoding, where you are able to modify the type of your file, but I'm generating a bulk from my database, and it generates a lot of individual text files, but the whole files is ANSI default.. I'm just looking for a function in php that make it possible to change the encoding before it generates the bulk.

If the source code help I can post it here. just let me know.

Thanks in advance.

EDITED

Follow a print of what I'm asking here.

enter image description here

When I generate the file "testecli01.csv" it always get encoding ANSI, whatever I do in my script it is always ANSI, and I need in UTF-8, just this. Is simple but I have no idea how to do.

djdy
  • 6,779
  • 6
  • 38
  • 62
devasia2112
  • 5,844
  • 6
  • 36
  • 56
  • 1
    Except that you're generating files from a database, the question [How to write file in UTF-8 format?](http://stackoverflow.com/questions/4839402/how-to-write-file-in-utf-8-format) quite matches yours. There is no magical call to change the encoding of a file, you have to read it, change the encoding, then write it back. – zneak Jul 08 '11 at 19:29
  • The above comment has it right. There's not magical database encoding conversion free lunch. – shelhamer Jul 08 '11 at 19:32
  • Is not the same question, it is about the file itself and not the contents of the file. Is a thing that freak me out.. no good resources, even the php docs itself.. I can do it by hand, but I have thousand of files ... 0_o – devasia2112 Jul 08 '11 at 19:49
  • 1
    @Fernando, text files don't have an 'encoding' property. The closest thing to that, for UTF-8, is a BOM marker at the beginning of a file. But even then, you _still_ have to convert the contents of the file to UTF-8: just throwing in a BOM isn't going to fix anything unless there are no special characters in your file, in which case they were valid UTF-8 to start with. – zneak Jul 09 '11 at 00:21
  • `notebad++` worked as charm with me, regarding converting files encoding. download it for free – Accountant م Aug 20 '16 at 18:29

4 Answers4

4

If your 3rd party program "do not support files in ANSI but UTF-8" as you mentioned in a comment then most likely it's expecting a BOM.

While the Unicode Standard does allow a BOM in UTF-8,[2] it does not require or recommend it.[3] Byte order has no meaning in UTF-8[4] so a BOM serves only to identify a text stream or file as UTF-8.

The reason the BOM is recommended against is that it defeats the ASCII back-compatibility that is part of UTF-8's design.

So strictly speaking your 3rd party program isn't completely compliant with the standard because the BOM should be optional. ANSI is 100% valid UTF-8 and that is one of the main drivers of it. Anything that can understand UTF-8 accordng to the standard by definition also understands ANSI.

Try writing "\xEF\xBB\xBF" to the front of the file and see if that solves your problem.

Community
  • 1
  • 1
Davy8
  • 30,868
  • 25
  • 115
  • 173
  • The 3rd party program is from gov and it is a very old program, ascent is not allowed, then you can imagine what type of program ... Does not matter, cause I generate the data. I think ANSI is correct 'cause it has all ascent, and the data is OK, but the gov program do not accept ascent, perheps I remove all ascent from my database.. ahahaha Thanks anyway, BOM I did'nt know about it. – devasia2112 Jul 12 '11 at 15:00
2

I do not know of a database that will do the encoding conversion for you easily. For example, in MySQL, you have to reset all the character encodings for the db, tables, and columns, AND THEN convert the data.

I would suggest instead that you create your database dump and use iconv to change the encoding, whether on the command line:

iconv -f original_charset -t utf-8 dumpTextData > convertedTextData

or in PHP (taken from How to write file in UTF-8 format?)

$input = fopen($file, 'r');
$output = fopen($file, 'w');
stream_filter_append($input, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($input, $output);
fclose($input);
fclose($output);

NOTE: edited to avoid leaking file descriptors.

Community
  • 1
  • 1
shelhamer
  • 29,752
  • 2
  • 30
  • 33
  • Your copied answer leaks file descriptors. If you have more than a few hundred files, this will cause problems. – zneak Jul 08 '11 at 19:32
  • @zneak thanks for pointing that out. I forget you can't trust people to know you need an `fclose`. Edited to include. – shelhamer Jul 08 '11 at 19:34
  • You're still leaking the file descriptor from `fopen` in the `stream_copy_to_stream`. :) I've fixed it for you. – zneak Jul 08 '11 at 19:35
  • ...and I need to remember how to read haha. I went to edit but you beat me to it. Thanks. – shelhamer Jul 08 '11 at 19:38
  • @Zneak Actually it has thousands and not hundreds of files, it is not in full use, but I expect to use in a production environment.. The problem is the thousand of .txt files will be imported by a 3td party program and it do not support files in ANSI but UTF-8. Then the txt files need to be in such a way.. – devasia2112 Jul 08 '11 at 19:43
  • If you notice, the answer was already edited to not leak descriptors and is correct (see the two `fclose`). A file "in" utf-8 is a utf-8 file, I am not sure what the problem you are facing is. – shelhamer Jul 08 '11 at 20:07
  • In true, the answer is good, I did myself something very close to this, but is my specific case it will not work. Also I had edited my post, I think it is easy to understand now. – devasia2112 Jul 11 '11 at 17:19
0

Excel likes CSV files to be UTF-16LE, and begin with '\xFF\xFE'.

My code to build a file for excel is:

echo "\xFF\xFE"; // marker for UTF-16 file;

foreach ($rows as $row)
    echo mb_convert_encoding($row, 'UTF-16LE');
Littm
  • 4,923
  • 4
  • 30
  • 38
Shahard
  • 1
  • 1
0

Old encoding is first, as it is in iconv function. You also can´t read and write same file.

    $input = fopen($path, 'r');
    $output = fopen($path . '.tmp', 'w');
    stream_filter_append($input, 'convert.iconv.OLDENCODING/UTF-8');
    stream_copy_to_stream($input, $output);
    fclose($input);
    fclose($output);
    unlink($path);
    rename($path . '.tmp', $path);
Vaci
  • 161
  • 1
  • 5