6

I have script that reads remote file content and writes it to local server. File contains characters: ąčęėįšųūž. After data insertion into local file, UTF-8 encoding is lost. My script code:

<?php 

$data = file_get_contents('remote_file_address');

echo $data; //encoding is ok

$file = dirname(__FILE__) . '/../downloads/data.csv';

file_put_contents($file,$data); //invalid encoding in data.csv file

?>

I also followed the instructions depending this post(How to write file in UTF-8 format?), but still no good.

So what is wrong with that? Any ideas?

Community
  • 1
  • 1
Bounce
  • 2,066
  • 6
  • 34
  • 65
  • 7
    at php.net there are a bunch of comments about this. http://php.net/manual/de/function.file-put-contents.php . Have you tried to re-encode the data like `file_put_contents($myFile, utf8_encode($myContent));` or setting the BOM like `file_put_contents($myFile, "\xEF\xBB\xBF".$myContent);` ? – sofl Jun 20 '12 at 08:31
  • If it's invalid, you should see the problem when you do `echo file_get_contents(dirname(__FILE__) . '/../downloads/data.csv');`. Is that the case? – Ja͢ck Jun 20 '12 at 08:38
  • @sofl, yes I've tried all these methods. – Bounce Jun 20 '12 at 08:41
  • @Jack, when I echo file_get_contents, I get the correct results. But after writing results to the file, encoding becomes invalid. – Bounce Jun 20 '12 at 08:51
  • 2
    @Bounce how did you determine that the encoding is wrong? – Ja͢ck Jun 20 '12 at 08:52
  • 2
    @Bounce: How do you know that the encoding becomes invalid? What is the encoding btw? – hakre Jun 20 '12 at 08:52
  • Since you're writing to CSV, allow me to guess that you're opening it using Excel. Let me go on record to say that Excel notoriously sucks with encodings. :) – deceze Jun 20 '12 at 08:54
  • @hakre, I know that, when I open the local file. But I guess the problem is with remote file. The remote file is encoded with windows-1257. And when I try to change the encoding to UTF-8, all symbols like ąčęėįšųūž become hieroglyphs. Because my local file encoding is correct(UTF-8 without BOM). – Bounce Jun 20 '12 at 09:52
  • @deceze, no Im using notepad++ :) – Bounce Jun 20 '12 at 09:53
  • If the original is encoded in 1257, then the final file is also 1257 and Notepad++ needs to open it as if it's a 1257 encoded file. If you want to convert the encoding and actually save a UTF-8 file, convert it with [`iconv`](http://php.net/manual/en/function.iconv.php). – deceze Jun 20 '12 at 09:59
  • @deceze, thanks I've already done it ;) – Bounce Jun 20 '12 at 10:04
  • @Bounce: A little hint: It's not helpful to ask about an encoding problem but giving the encodings only after being asked for. In your case you only need to re-encode the file data from `windows-1257` to `UTF-8` and you're done. See [How do I change the encoding of a string?](http://kore-nordmann.de/blog/php_charset_encoding_FAQ.html#how-do-i-change-the-encoding-of-a-string) – hakre Jun 20 '12 at 11:48
  • in short: BOM works – djdance Feb 10 '22 at 10:26

3 Answers3

12

The problem was remote file with windows-1257 encoding. I found the solution here.

So the correct code should look like this:

<?php 

$data = file_get_contents('remote_file_address');

$data = iconv("CP1257","UTF-8", $data);

$file = dirname(__FILE__) . '/../downloads/data.csv';

file_put_contents($file,$data);

?>
Community
  • 1
  • 1
Bounce
  • 2,066
  • 6
  • 34
  • 65
  • There is an interesting answer for cases when you don't know the original encoding - http://stackoverflow.com/a/7980354/1835470 , quick hint: `mb_detect_encoding()` – jave.web Aug 18 '16 at 18:54
9

PHP does not know about encodings. Strings in PHP are simply byte arrays that store raw bytes. When reading from somewhere into a string, the text is read in raw bytes and stored in raw bytes. When writing to a file, PHP writes the raw bytes into the file. PHP does not convert encodings by itself at any point. You do not need to do anything special at any point, all you need to do is to not mess with the encoding yourself. If the encoding was UTF-8 to begin with, it'll still be UTF-8 if you didn't touch it.

If the encoding is weird when opening the final file in some other program, most likely that other program is misinterpreting the encoding. The file is fine, it's simply not being displayed correctly.

deceze
  • 510,633
  • 85
  • 743
  • 889
1

Be sure your script and the remote file is encoded in UTF-8 and be sure the soft you're using to read your data.csv read it in UTF-8. I personnaly use Notepad++ to check this. If all of your stuff is in UTF-8, you don't need any *utf8_(en|de)code function. You'll must use them if your remote file is not encoded in UTF-8

niconoe
  • 1,191
  • 1
  • 11
  • 25