4

After migrating to a new server, upon executing a SELECT query, if a requested column value is NULL, Perl's DBI::fetchrow_array() returns what appears to be an empty string: defined() returns 1 and length() returns 0.

Everything I read tells me that I should be getting undef from a NULL, and indeed this is the way it works on my old server. The new server has a copy of the MySQL database which I migrated using the Export SQL and Import SQL features of Sequel Pro, a MySQL gui which I run on my Mac. For both databases, the values in question are clearly indicated as gray NULL in Sequel Pro, and as NULL if I run mysql interactively. For example, see name in this transcript:

mysql> SELECT * from trials WHERE id = 26069 ;
+-------+----------+-----------+---------+--------+ ...
| id    | language | numTrials | name    | status | ...
+-------+----------+-----------+---------+--------+ ...
| 26069 | en       |         3 | NULL    | Done   | ...
+-------+----------+-----------+---------+--------+ ...
1 row in set (0.00 sec)

My old server is running older packages:

  • Perl 5.10.1 vs. 5.22.1
  • DBI 1.634 vs. 1.636
  • DBD::mysql 4.022 vs. 4.033

Here is my code around fetchrow_array:

@result = $statementHandle->fetchrow_array ;

for my $value (@result) {
    if (! utf8::is_utf8($value)) {
        utf8::decode($value) ;

        # Log values for debugging the NULL/undef issue:
        my $aValue = $value ;
        my $len = length($aValue) ;
        if (!defined($value)) {
            $aValue = "<unnndefined>" ;
        }
        print {*::STDLOG} info => "daValue l=$len : $daValue\n" ;
    }
}

Big thanks to anyone who can suggest what might be going on here!

Jerry Krinock
  • 4,860
  • 33
  • 39

1 Answers1

4

You code will have produced the warning

Use of uninitialized value in subroutine entry

Your call to utf8::decode($value) attempts to convert $value to a character string. If you pass it undef then the value will be treated as the empty string (with the accompanying warning) and decoded and stored as such

It seems strange to enclose all of that code in your conditional statement. Surely you just want

utf8::decode($value) if $value and not utf8::is_utf8($value);

and then the rest of the code should be independent of the field's encoding status?

Even better, you should just make sure that DBI automatically encodes and decodes characters strings on their way to and from the database, by adding the option {mysql_enable_utf8mb4 => 1} to the database connect call. You will also need to add CHARACTER SET utf8mb4 to either the CREATE TABLE or CREATE DATABASE statements. (Don't use CHARACTER SET utf8; that is a subset of real UTF-8 limited to three bytes per character.) Of course, you can ALTER TABLE after the initial creation if you need to.

ThisSuitIsBlackNot
  • 23,492
  • 9
  • 63
  • 110
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • Bingo. As you suggested, simply removing that utf8::decode($value), I now get undef as expected. – Jerry Krinock Oct 01 '16 at 15:59
  • I wasn't looking at that, because my old server has exactly that same code and works fine. Probably there is something different in the database configuration, as you suggested, and I need to dig into that as you explained. It has been on my to-do list for years to sort out some UTF8 shenanigans which are apparent in this system. It has worked fine for non-ASCII characters but there is evidence that multiple programming errors are cancelling each other out :(( – Jerry Krinock Oct 01 '16 at 16:06
  • 1
    It turned out that the old code involving `utf8::decode()` was a sloppy workaround for the fact that this database had bastard UTF-8 data which was stored as Latin-1. In addition to completely removing that code, and enabling `utf8mb4` during `connect()`, I needed to [convert the bastard data](http://stackoverflow.com/questions/9407834/mysql-convert-latin1-characters-on-a-utf8-table-into-utf8). – Jerry Krinock Oct 03 '16 at 13:23