4

I have a Laravel application that works with a database with charset latin1 (I cannot change it).

In my .env I have the following configuration:

DB_CONNECTION_INTRANET=mysql
DB_HOST_INTRANET=xxxxx
DB_PORT_INTRANET=3306
DB_DATABASE_INTRANET=xxx
DB_USERNAME_INTRANET=xxx
DB_PASSWORD_INTRANET=xxx
DB_CHARSET_INTRANET='latin1'
DB_COLLATION_INTRANET='latin1_swedish_ci'

In config/database.php I use this configuration:

'mysqlIntranet' => [
    'driver'    => env('DB_CONNECTION_INTRANET'),
    'host'      => env('DB_HOST_INTRANET'),
    'port'      => env('DB_PORT_INTRANET'),
    'database'  => env('DB_DATABASE_INTRANET'),
    'username'  => env('DB_USERNAME_INTRANET'),
    'password'  => env('DB_PASSWORD_INTRANET'),
    'unix_socket' => env('DB_SOCKET_INTRANET', ''),
    'charset' => env('DB_CHARSET_INTRANET'),
    'collation' => env('DB_COLLATION_INTRANET'),
],

And the model use this connection:

class Cliente extends Model
{
    protected $connection = 'mysqlIntranet';

When I perform SELECT estado FROM clientes (I work from console) I get the following result:

+---------------+
| estado        |
+---------------+
| Informado     |
| Contratación  |

However, although I configure charset in Laravel, when I run this code:

$client = Cliente::query()->first();
$estado = $client->getAttribute('estado');
var_dump($estado . " - " . mb_detect_encoding($estado));
$estado2 = utf8_encode($estado);
var_dump($estado2 . " - " . mb_detect_encoding($estado2));

I get this strange result:

string(20) "Contrataci�n - UTF-8"
string(21) "Contratación - UTF-8"

I don't understand why the configuration of .env doesn't do the work - any ideas?

====================== UPDATE ======================
I tried to simulate the problem in local, and found out that Laravel works fine. I created this DB:

CREATE DATABASE my_db CHARACTER SET latin1 COLLATE latin1_swedish_ci;

with .env values:

DB_CHARSET_INTRANET="latin1"
DB_COLLATION_INTRANET='latin1_swedish_ci'

and everything worked fine, as expected.

I decided to try setting the following configuration in my server:

DB_CHARSET_INTRANET="utf8"
DB_COLLATION_INTRANET='utf8_general_ci'

and to my big surprise I managed to retrieve the values correctly!

Just in case, I checked directly in the MySql server:

> SELECT * FROM information_schema.SCHEMATA  WHERE schema_name = "xxx";
+--------------+-------------+----------------------------+------------------------+----------+
| CATALOG_NAME | SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME | SQL_PATH |
+--------------+-------------+----------------------------+------------------------+----------+
| NULL         | xxx         | latin1                     | latin1_swedish_ci      | NULL     |
+--------------+-------------+----------------------------+------------------------+----------+

> SELECT T.table_name, CCSA.* 
  FROM information_schema.`TABLES` T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA 
  WHERE CCSA.collation_name = T.table_collation AND T.table_schema = "xxx";
+-------------+-------------------+--------------------+
| table_name  | COLLATION_NAME    | CHARACTER_SET_NAME |
+-------------+-------------------+--------------------+
| clientes    | latin1_swedish_ci | latin1             |
| clientes_sm | latin1_swedish_ci | latin1             |
+-------------+-------------------+--------------------+

So now I have things work, I would only like to understand why...

guyaloni
  • 4,972
  • 5
  • 52
  • 92
  • 1
    Where is your database connection established? Have you checked if the .env variables are applied correctly? – Blackbam Jan 31 '19 at 15:19
  • 1
    Check out database.php in config dir. Hope that you would find what you need. – unclexo Jan 31 '19 at 15:24
  • @Blackbam, how can I check it? – guyaloni Jan 31 '19 at 15:36
  • @unclexo, just added the relevant information to the question, thanks. – guyaloni Jan 31 '19 at 15:36
  • Did you try clearing config cache? BTW you can check env vars by echoing config('database.connections.mysqlIntranet.collation') – unclexo Jan 31 '19 at 16:00
  • @unclexo, it is a fresh deploy, anyway I did and it did not fix the problem. – guyaloni Jan 31 '19 at 17:09
  • @unclexo, echoing `config('database.connections.mysqlIntranet.charset')` gives `latin1`, and `config('database.connections.mysqlIntranet.collation')` gives `latin1_swedish_ci` - as expected. – guyaloni Jan 31 '19 at 17:10
  • Last, you can try adding `'options' => [PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES latin1"]` to your db configuration. – unclexo Jan 31 '19 at 19:51
  • Something in the data flow using utf8 instead of latin1. Find that and fix it. – Rick James Jan 31 '19 at 23:09
  • @RickJames, this is quite obvious, the problem is that I do not find it! – guyaloni Feb 01 '19 at 08:24
  • @guyaloni - Encoding in the client; Connection (to mysql) parameters that specify the client encoding; Column definition in table; `SHOW VARIABLES LIKE `char%';` See "Best Practice" in https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Feb 01 '19 at 16:42

1 Answers1

0

mb_detect_encoding cannot be trusted, it often tells something is encoded in utf8 when it's not. Very least use the third parameter to make it "strict": http://php.net/manual/en/function.mb-detect-encoding.php

A method that can be trusted is looking at the individual bytes. If ó is encoded as a single byte, it may be latin1, if it's two bytes it may be utf8. The php function bin2hex converts a string into hex and it makes it easier for you to check what bytes it contains.

From your php output, the string is definitely encoded in latin1. The reason why you see "Contrataci�n" is because the program that shows your output (console? Web page? Log viewer?) is assuming your output is in UTF-8, and it's not, it's latin1.

Joni
  • 108,737
  • 14
  • 143
  • 193
  • Thanks, good explanation for the wrong detection. Still, my problem is why Laravel DB engine does not read correctly entries from database *although* the connection knows it is latin1. – guyaloni Feb 01 '19 at 08:23
  • Why do you say it doesn't read correctly? To me it looks like your problem is on the display side. Or do you want to keep UTF-8 as the page encoding while keeping the back end as latin1? – Joni Feb 01 '19 at 12:41
  • I read data from DB and send it as json to another service. In order to debug the information, I use debug messages to console, there I see the problem. – guyaloni Feb 01 '19 at 15:35
  • The JSON specification requires you use UTF-8. Even in PHP, if you include a non-utf8 string in what you pass to the `json_encode` function, it will fail with [`JSON_ERROR_UTF8`](http://php.net/manual/en/function.json-last-error.php). So you have two problems: your console does not display latin1 correctly, and your json interface will not work – Joni Feb 01 '19 at 17:06
  • 1
    Why can't you change the database connection encoding to utf8 in your .env? Changing the connection encoding means the database will convert data from the storage encoding. Your app will use utf8 to communicate with the DB, while internally the DB will use latin1 for storage. That way you don't need to change any settings in your console, or jump through hoops to be able to use `json_encode`. – Joni Feb 01 '19 at 17:12