1

I tried to read the vietnamese address from Mysql databases through perl program. But it display some special characters which is not recognize. When i see the string in phpMyAdmin is ok. And using php to retrieve and print the string also no problem. Just only using perl to print out the string will become weird.

Original word

QL37, Phố Vôi, tt. Vôi, Lạng Giang, Bắc Giang, Vietnam 

After print become

QL37, Phoá Voâi, tt. Voâi, Laïng Giang, Baéc Giang, Vietnam

The database structure

CREATE TABLE `address` (
    `Address_Id` int(11) NOT NULL,
    `Coordinate_Lat` float(15,6) NOT NULL,
    `Coordinate_Long` float(15,6) NOT NULL,
    `Address` varchar(300) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Perl Code

my $sql = "SELECT * FROM `address`";
my $sth = $DBIconnect->prepare($sql);
$sth->execute or die "SQL Error: $DBIconnect::errstr\n";
while (my $row = $sth->fetchrow_hashref) {
    print $row->{'Address'};
}

Print Screen

LINK1 - PHPMyAdmin look fine

LINK2 - Retrieve and print out error

Community
  • 1
  • 1
chocoboon
  • 11
  • 4
  • Can you post your DB connection here? The issue probably comes from there. – Ynhockey Nov 15 '17 at 15:38
  • What @Ynhockey means is, we'd like to see the Perl code that creates your database handle. Without the real username and password, obviously. Please [edit] the question to include that. It would also be useful to know how the table is set up. The results of `SHOW CREATE TABLE` and the default database collation would help, too. – simbabque Nov 15 '17 at 15:50
  • Converting that output to [DOS character set](https://en.wikipedia.org/wiki/Code_page_437) and decoding the result as UTF-8, it appears that the input is [VNI encoded](http://vietunicode.sourceforge.net/charset/). For example `a├»` is `"a\x{C3}\x{AF}"` in CP-437, which is `"a\x{EF}"` after UTF-8 decode, which is `ạ` (LATIN SMALL LETTER A WITH DOT BELOW) in the VNI encoding. – mob Nov 15 '17 at 16:36
  • 1
    Do you the mysql_enable_utf8 attribute when connecting to db? – jira Nov 15 '17 at 18:00
  • Do you pass `mysql_enable_utf8mb4 => 1` to `connect`? And how do you encode the output to you terminal? – ikegami Nov 15 '17 at 19:44
  • It is mysql_enable_utf8 setting is for insert. SELECT command need to set also? Database collection already change to utf8_unicode_ci still the same. – chocoboon Nov 16 '17 at 04:28
  • See also this question on how to display utf8 chars in windows cmd window: https://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how – jira Nov 16 '17 at 15:09

2 Answers2

0

You need to add -C to the cmdline. perldoc perlrun

Or add #!/usr/bin/perl -C as first line to enable Unicode output.

rurban
  • 4,025
  • 24
  • 27
0

I did try set at DBI connetion with the following code. But still the same result.

my $dsn= "dbi:mysql:database=$database:$host;mysql_connnect_timeout=15;user=$user;password=$password";
my $DBI = DBI->connect($dsn, {mysql_enable_utf8 => 1}) or die "Unable to connect: $DBI::errstr\n";
$DBI->do("SET NAMES utf8");
$DBI->do("SET CHARACTER SET utf8"); 

After that i copy out the string and test without connection to database. I'm using the Encode for print. Only the cp850 and cp437 can convert atleast the symbol ô

use utf8;
use Encode 'encode';

$address="QL37, Phố Vôi, tt. Vôi, Lạng Giang, Bắc Giang, Vietnam";

print $address."\n";
# QL37, Ph? V⌠i, tt. V⌠i, L?ng Giang, B?c Giang, Vietnam
print encode cp850 => $address."\n";
# QL37, Ph? Vôi, tt. Vôi, L?ng Giang, B?c Giang, Vietnam
print encode cp1252 => $address."\n";
# QL37, Ph? V⌠i, tt. V⌠i, L?ng Giang, B?c Giang, Vietnam
print encode cp437 => $address."\n";
# QL37, Ph? Vôi, tt. Vôi, L?ng Giang, B?c Giang, Vietnam
print encode cp1258 => $address."\n";
# QL37, Ph? V⌠i, tt. V⌠i, L?ng Giang, B?c Giang, Vietnam
print encode VISCII => $address."\n";
# QL37, Ph? V⌠i, tt. V⌠i, L?ng Giang, B?c Giang, Vietnam

Actually the main problem is after i process the data and need insert back to database. The print command is just for my debug. i have try to insert the address back to msyql, the string will become weird also.

chocoboon
  • 11
  • 4