2

I have hard time with character charset, I suspect my fonction that display date to return non UTF-8 character (août is replaced by a question mark inside a diamond août).

When working on my local server everything's fine but when I push my code on my staging server, it's not displaying properly.

  • My php files are saved as UTF-8 NO BOM
  • If I inspect my output page, headers indicate UTF-8.
  • My local machine is a Mac with MAMP installed and my stating server have CentOS with cPanel installed.

Here is the part I suspect causing problem :

$langCode = "fr_FR"; /* Alos tried  fr_FR.UTF-8 */
setlocale(LC_ALL, $langCode);
$monthName = _(strftime("%B",strtotime($dateStr)))
echo $monthName; /* Alos tried  utf8_encode($monthName) worked on my staging server but not on my local server ! I'm using  */
Jonathan Lafleur
  • 493
  • 5
  • 25
  • I think the problem has something to do with the OS or the web server. – Alireza Jul 22 '14 at 19:07
  • août looks like correct UTF-8 but served as Latin-1 or similar. What does your browser indicate as the character set of the page? – tripleee Jul 22 '14 at 19:08
  • try this `echo utf8_decode($monthName);` – Khalid Jul 22 '14 at 19:11
  • Have a look at the headers your server send. I suspect (as @triplee) that he sends `Content-Type: text/html; charset=latin-1` – VMai Jul 22 '14 at 19:11
  • hey to make sure that the problem is with server just try to set Character encoding to Unicode in view option of your browser and see the result. – Bahrami-Reza Jul 22 '14 at 19:12
  • possible duplicate of [How do I set Character Encoding to UTF-8 for default.html?](http://stackoverflow.com/questions/905173/how-do-i-set-character-encoding-to-utf-8-for-default-html) – tripleee Jul 22 '14 at 19:16
  • @Alireza i'm pretty sure too – Jonathan Lafleur Jul 22 '14 at 19:23
  • @tripleee sorry it was a question mark in a diamond, but I made a mistake when typing my question here, because when I use utf8_encode it give me the août in my local server, but it's okay in the staging one. And I don't think it's a duplicate as it's not a problem about encoding my file, I've already checked that part, it's all in UTF-8 without BOM. – Jonathan Lafleur Jul 22 '14 at 19:24
  • @VMai I've already checked my charset sent, they are both utf-8 – Jonathan Lafleur Jul 22 '14 at 19:24
  • Are you by chance using this in conjunction with a DB? – Funk Forty Niner Jul 22 '14 at 19:24
  • @Bahrami-Reza if I change my charset in the browser for ISO-8859-1 it worked flawlessly – Jonathan Lafleur Jul 22 '14 at 19:25
  • @Fred-ii- yes I retrieve the date as 20141001 then use strtotime and strftime to modify it. – Jonathan Lafleur Jul 22 '14 at 19:26
  • So what are the actual bytes which render as a question mark inside a diamond? Generally this is a symptom of an invalid UTF-8 sequence but there are other reasons it could appear. – tripleee Jul 22 '14 at 19:27
  • Then, try adding `$con->set_charset("utf8");` just before your query, replacing `$con` with your DB connection variable. That usually helps. Yet, it's different syntax if you're using PDO. – Funk Forty Niner Jul 22 '14 at 19:28
  • @tripleee it's every french character é è à û ù ... – Jonathan Lafleur Jul 22 '14 at 19:29
  • @JonathanLafleur: Did you try `echo bin2hex($monthName);` – VMai Jul 22 '14 at 19:30
  • Have a look at this page on SO http://stackoverflow.com/questions/279170/utf-8-all-the-way-through - there's a lot in there that you could go through. It's hard to pinpoint what "your" actual problem could be. Any chance of showing full code and DB schema? – Funk Forty Niner Jul 22 '14 at 19:32
  • @VMai I'm not working directly with binary :P Don't need to convert it to hexadecimal... – Jonathan Lafleur Jul 22 '14 at 19:32
  • There's also http://stackoverflow.com/a/16123118/ and `utf8_encode()` that could help. Also `mb_internal_encoding("UTF-8");` http://php.net/manual/en/function.mb-internal-encoding.php - Which API are you using, `mysqli_` or PDO? – Funk Forty Niner Jul 22 '14 at 19:36
  • @JonathanLafleur: It will show us how your string is encoded. It will show us too, which bytes are used for û. If it's the correct byte order for UTF-8. – VMai Jul 22 '14 at 19:37
  • @VMai in my local server : 'août' (length=5) 56656e6472656469203820616fc3bb74202d2031392068 in staging server : string(4) "août" 56656e6472656469203820616ffb74202d2031392068 Something really strange is that string(4) and length=5 difference ? Both result are a var_dump of the monthname... local server have xdebug installed. – Jonathan Lafleur Jul 22 '14 at 19:41
  • Don't know if that mather a lot since I haven't see anything else break, but my local server is under PHP 5.5.10 and my staging still in PHP 5.4 – Jonathan Lafleur Jul 22 '14 at 19:43
  • Hard to say. There might be some function differences between both, but I sort of doubt it. Which MySQL API are you using, on both? – Funk Forty Niner Jul 22 '14 at 19:47
  • Does it really matter since I retrieve integer than convert them after ? – Jonathan Lafleur Jul 22 '14 at 19:55
  • I couldn't say to be certain. That would be out of the scope of my knowledge. – Funk Forty Niner Jul 22 '14 at 19:57
  • See if your DB, and table and affected column(s) all have the same encoding/collation. For example `latin1_general_ci` – Funk Forty Niner Jul 22 '14 at 20:03
  • @JonathanLafleur The results of hex2bin('août') are a) for UTF-8 encoding: 616fc3bb74, thats the 5 Bytes of your length. and b) for latin-1 encoding: 616ffb74, the 4 Bytes of your length. So on the staging server the string 'août' is encoded with latin-1. You will find those in your lenghty hex-strings by this pattern. – VMai Jul 22 '14 at 20:53
  • @VMai and is there a way to change the way my staging server behave ? I want it utf8 not latin1 ! :\ – Jonathan Lafleur Jul 23 '14 at 13:04

4 Answers4

2

Finally found how to find the bug and fix it.

setlocale(LC_ALL, 'fr_FR');

var_dump(mb_detect_encoding(_(strftime("%B",strtotime($dateStr)))));

the dump returned UTF-8 on local and FALSE on staging server.

PHP.net documentation about mb_detect_encoding()

Return Values

The detected character encoding or FALSE if the encoding cannot be detected from the given string.

So charset can't be detected. I will try to force it "again"

setlocale(LC_ALL, 'fr_FR.UTF-8');

var_dump(mb_detect_encoding(_(strftime("%B",strtotime($dateStr)))));

this time the dump returned UTF-8 on local and UTF-8 on staging server. So I rollback my code to see what's happened when I tried first time with fr_FR.UTF-8 why does it was not working ? And I realize I was using utf8_encode() like pointed by user deceze in comment of this function's doc,

In fact, applying this function to text that is not encoded in ISO-8859-1 will most likely simply garble that text.

Thank you for your help everyone !

Jonathan Lafleur
  • 493
  • 5
  • 25
  • +1 Right on Jonathan, glad to hear it. I was about to mention something like `iconv("ISO-8859-1", "UTF-8", $field)` as per http://php.net/manual/en/function.iconv.php – Funk Forty Niner Jul 23 '14 at 15:29
0

put this meta tag on your html code inside <head></head>

<meta charset="UTF-8">
Khalid
  • 4,730
  • 5
  • 27
  • 50
  • It's a good idea, if one opens files locally. But it won't help, if the server sends the wrong charset in its headers. – VMai Jul 22 '14 at 19:14
0

It seems your server are configured to send the header

content-type: text/html; charset=UTF-8

as default. You could change your server configuration or you could add at the very start

<?php
    header("content-type: text/html; charset=UTF-8");
?>

to set this header by yourself.

VMai
  • 10,156
  • 9
  • 25
  • 34
0

you need to use :

  <?php
  $conn = mysql_connect("localhost","root","root");
  mysql_select_db("test");

  mysql_query("SET NAMES 'utf8'", $conn);//put this line after you select db.
Bahrami-Reza
  • 608
  • 2
  • 7
  • 24