Encoding trouble when writing in database (Perl)

Question

In Perl v5.10.1, I try to read a file and store strings in a database. Problems arise when strings contain accents and exotic characters.

On my CentOS 6, the 'locale' command indicates: LANG=en_US.UTF-8

My DB is MySQL, the field I'm writing on is varchar(64) utf8_unicode_ci.

I run my test through a Putty console, set with Window > Translation > Remote character set: UTF8, though printed characters are garbled, but this is not the main problem.

Here's my script:

#!/usr/bin/perl
use warnings;
use strict;
use utf8;
use open ':std', ':encoding(UTF-8)';
use DBI;

# A test string
my $test = 'é';
print "- 1: $test\n";

# First string in my file, containing a single 'é'
my $string = '';
open(my $fh, '<', 'myFile');
while(my $line = <$fh>) {
  chomp $line;
  $string = $line;
  last;
}
close $fh;
print "- 2: $string\n";


# Writing test string and first string in DB
my $dbistring = 'DBI:mysql:database=xxxx;host=xxxx;port=xxxx';
my $socket = DBI->connect($dbistring, 'xxxx', 'xxxx');
my $cmd = 'UPDATE Strings SET string="'.$test.'" WHERE id=1';
my $request = $socket->prepare($cmd);
$request->execute();
$cmd = 'UPDATE Strings SET string="'.$string.'" WHERE id=2';
$request = $socket->prepare($cmd);
$request->execute();

The prints are as follows:

1: ▒
2: ▒

In my DB table, fields end up as:

id 1: Ã©
id 2: Ã©

To avoid a possible double-encoding from Perl string concatenation, I tried:

$string = Encode::decode('UTF-8', $string);

giving me the same result. Same if I indicate '<:encoding(UTF-8)' when opening the file.

I am much confused, as my process chain seems to be all set in UTF8. Suggestions greatly appreciated.

At the very least, you need to add `use open ':std', ':encoding(UTF-8)';` to add an encoding/decoding layer to STDIN/STDOUT/STDERR and to files opened in scope. — ikegami, May 03 '18 at 20:22

score 1 · Answer 1 · answered May 03 '18 at 18:31

1

This valuable article provided the solution:

The problem arises during the communication between DBI and the DB, and is solved by adding the mysql_enable_utf8 flag during the connection:

DBI->connect($dbistring, 'xxxx', 'xxxx', { mysql_enable_utf8 => 1 });

answered May 03 '18 at 18:31

Bob

1,495
1
19
24

I forgot it effectively does `SET NAMES utf8` when passed to `connect`, in addition to the decoding effect obtained from `$dbh->{mysql_enable_utf8} = 1;` – ikegami May 04 '18 at 20:25

score 1 · Answer 2 · answered May 04 '18 at 05:01

Some issues in Perl

use utf8;
use open ':std', ':encoding(UTF-8)';

my $dbh = DBI->connect("dbi:mysql:".$dsn, $user, $password, {
       PrintError => 0,
       RaiseError => 1,
       mysql_enable_utf8 => 1,  # Switch to UTF-8 for communication and decode.
});
# or {mysql_enable_utf8mb4 => 1} if using utf8mb4

Mojibake

See "Mojibake" in Trouble with UTF-8 characters; what I see is not what I stored for other issues to check on.

Encoding trouble when writing in database (Perl)

2 Answers2