13

What is the reason, that using CONCAT() in pure UTF-8 environment MySQL still treats concatenated string (when some col in expression is for example int or date) as some other charset (probably Latin-1)?

MySQL environment seen from client (\s):

Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8

Test dataset:

CREATE TABLE `utf8_test` ( 
    `id` int(10) unsigned NOT NULL auto_increment, 
    `title` varchar(50) collate utf8_estonian_ci default NULL,
    `year` smallint(4) unsigned NOT NULL default '0', 
    PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_estonian_ci;

INSERT INTO utf8_test VALUES (1, 'Õäöüšž', 2011);

This query is good:

SELECT id, title FROM utf8_test;

This one turns utf-8 flag off (already in MySQL, AFIU):

SELECT CONCAT(id, title) FROM utf8_test;

From mysql-client everything seems fine, because it is set to show chars as UTF-8, but when running through perl DBI, all results of queries having CONCAT() inside don't have utf-8 flag set. Example code:

#!/usr/bin/perl

use strict;
use utf8::all;
use Encode qw(is_utf8);

my $dbh = your_db_connect_routine('test');

my $str = $dbh->selectrow_array('SELECT CONCAT(id, title) FROM utf8_test');
print "CONCAT: False\n" unless ( is_utf8($str) );

my $str = $dbh->selectrow_array('SELECT title FROM utf8_test');
print "NO CONCAT: False\n" unless ( is_utf8($str) );

There is at least two workarounds i know

  • quering with CAST() SELECT CONCAT( CAST(id AS CHAR CHARACTER SET utf8), title) FROM utf8_test
  • using $str = Encode::_utf8_on($str) (is considered as bad practice?)

but i am asking: why it is in MySQL so? Should i consider it as bug or feature?

w.k
  • 8,218
  • 4
  • 32
  • 55
  • What happens in other clients besides Perl DBI? For example, a MySQL GUI tool or some other language? Also have you tried an alternate driver such as `DBD::Wire10` – Dan Jun 19 '11 at 02:49
  • @Dan: `DBD::Wire10` had same behaviour. But works right in Python. Still problem in DBI? – w.k Jun 19 '11 at 19:29

2 Answers2

22

It's a well known bug in MySQL. It's fixed in MySQL 5.5

See: http://bugs.mysql.com/bug.php?id=12030

The issue stems from concatenating an integer with a varchar.

The work around is to cast the id (integer) first to a char, and then concatenate, ie:

SELECT CONCAT(cast(id as char), title) FROM utf8_test
Eljakim
  • 6,877
  • 2
  • 16
  • 16
  • 2
    Not only integers, but also timestamps cause the problem. Thanks for the workaround! +1 – Tomas Nov 19 '12 at 12:29
  • Timestamps are anyway already stored as INTs so if you handle all INT columns, it should be fine. – dknight Dec 26 '13 at 09:03
1

It probably is DBD::mysql issue/peculiarity. Try enabling utf8 in database handle as described in POD for DBD::mysql (mysql_enable_utf8 part).

This old (Perl 5.8 times) article can also help.

Roman Grazhdan
  • 467
  • 4
  • 15
  • As i answered to Dan: same behaviour with `DBD::Wire10` driver. For `DBD::mysql` `mysql_enable_utf8` is also in connection-function. And i don't understand, how could query affect on DBI or DBD::* level? – w.k Jun 21 '11 at 21:29
  • On CentOS 7 with the vendor-supplied `perl-DBD-MySQL-4.023-6`, I still had to `$dbh->do('SET NAMES utf8');` (as described in the article you link to), even though my server, schema, table, and columns were all utf8, with utf8 collations. So it seems the Perl DBD::mysql client's seeming to default to "latin1" was the issue for me. – TheDudeAbides Aug 20 '19 at 22:17