0

Using PHP CLI, this works well:

$result = iconv (LATIN1, 'UTF-8', N�n��;M�tt);

Result is: Nönüß

This also works for CP437, Windows, Macintosh etc.

On apache, the SAME code results in:

$result = iconv (LATIN1, 'UTF-8', N�n��;M�tt);

Result is: Nönüß

I googled around and added setlocale(LC_ALL, "en_US.utf8"); to the script, but made no difference. Thanks for helping!

I run Debian Linux with apache2 and php 5.4. I am trying to convert different CSV files as they are being uploaded into UTF-8 for processing.

UPDATE: I found my own solution.

$result = utf8_decode (iconv (LATIN1, 'UTF-8', N�n��;M�tt));

utf8_decode makes it show up correctly in the browser and when saved to the MySQL DB.

1 Answers1

0

There are always two sides to encoding: the encoded string, and the entity interpreting this encoded string into readable characters! This "entity", as I'll ambiguously call it, can be the database, the browser, your text editor, the console, or whatever else.

$result = iconv('LATIN1', 'UTF-8', 'N�n��;M�tt');

Result is: Nönüß

Not sure where you're getting 'N�n��;M�tt' from exactly, but the UNICODE REPLACEMENT CHARACTERS � in there indicate that you're trying to interpret this string as UTF-8, but the string is not actually UTF-8 encoded. Using iconv to convert it from Latin-1 to UTF-8 makes the correct characters appear - that means the string was originally Latin-1 encoded and converting it to your expected encoding solved the discrepancy.

On apache, the SAME code results in Nönüß

That means the interpreting party here is not interpreting the string as UTF-8 this time, even though the string is UTF-8. I assume by "Apache" you mean "in the browser". You need to tell your browser through HTTP headers or HTML meta tags that it's supposed to interpret the text as UTF-8.

I found my own solution.

$result = utf8_decode (iconv (LATIN1, 'UTF-8', N�n��;M�tt));

Guess what utf8_decode does. It converts the encoding of a string from UTF-8 to Latin-1. So the above code converts Latin-1 to ... Latin-1.

Please read the following:

Community
  • 1
  • 1
deceze
  • 510,633
  • 85
  • 743
  • 889
  • OK, thanks. On the command line, utf8_decode was not necessary, but for correct display in the browser it was the missing link (N�n��;M�tt is what showed in the browser before the iconv step, based on data read from a CSV file uploaded by a user). – user1145075 Aug 22 '14 at 09:37
  • Well, you simply have an encoding mismatch. The command line seems to expect and display UTF-8, the browser expects Latin-1 because you haven't told it otherwise, the CSV seems to be Latin-1 encoded. Convert the CSV data to UTF-8 as you're reading from it, and keep everything in UTF-8 from there on. – deceze Aug 22 '14 at 09:50
  • That I have not understood yet. The command line has locale "en_US.utf8" set, you can throw Windows/Macintosh/MsDOS/Latin etc. encoded text at it, and it will convert it all to UTF-8 and show the correct output on the command line. The web interface does exactly that, too (locale is also set likewise), but will not output correctly unless one first runs the output through utf8_decode. So, it seems that iconv does a correct conversion to UTF-8, but the browser display doesn't show correctly unless one runs the result through utf8_decode. That is at least what I understand about it. – user1145075 Aug 22 '14 at 13:21
  • The command line will never **convert** anything automatically by itself. It simply *interprets* whatever it gets and displays it in the set encoding. The same goes for a browser. I suggest you read the last two articles I linked to above to get a grip for encodings. – deceze Aug 22 '14 at 13:24
  • I did not say this but I wrote a script that will detect the encoding and convert it. That is what I mean by "automatic". Of course a command line or browser will not do anything to a text, but my script performs coding conversions using "iconv". But I will look at the articles. – user1145075 Aug 22 '14 at 18:58