How can I properly align UTF-8 strings with Perl's printf?

Question

what is the right way to get here a beautiful output ( all lines the same indent )?

#!/usr/bin/env perl
use warnings;
use strict;
use DBI;

my $phone_book = [ [ qw( name number ) ],
            [ 'Kroner', 123456789 ],
            [ 'Holler', 123456789 ],
            [ 'Mühßig', 123456789 ],
            [ 'Singer', 123456789 ],
            [ 'Maurer', 123456789 ],
];

my $dbh = DBI->connect( "DBI:CSV:", { RaiseError => 1 } );
$dbh->do( qq{ CREATE TEMP TABLE phone_book AS IMPORT( ? ) }, {}, $phone_book );

my $sth = $dbh->prepare( qq{ SELECT name, number FROM phone_book } );
$sth->execute;

my $array_ref = $sth->fetchall_arrayref();

for my $row ( @$array_ref ) {
    printf "%9s %10s\n", @$row;
}

# OUTPUT:

#   Kroner  123456789
#   Holler  123456789
# Mühßig  123456789
#   Singer  123456789
#   Maurer  123456789

score 4 · Answer 1 · answered Sep 26 '12 at 19:05

    #!/usr/bin/env perl

    use warnings;
    use strict;

    use utf8; # This is to allow utf8 in this program file (as opposed to reading/writing from/to file handles)

    binmode( STDOUT, 'utf8:' ); # Allow output of UTF8 to STDOUT

    my @strings = ( 'Mühßig', 'Holler' ); # UTF8 in this file, works because of 'use utf8'

    foreach my $s (@strings) { printf( "%-15s %10s\n", $s, 'lined up' ); } # should line up nicely

    open( FILE, 'utf8file' ) || die("Failed to open file: $! $?");

    binmode( FILE, 'utf8:' );

    # Same as above, but on the file instead of STDIN

    while(<FILE>) { chomp;printf( "%-15s %10s\n", $_, 'lined up' ); }

    close( FILE );

    # This works too
    use Encode;

    open( FILE, 'utf8file' ) || die("Failed to open file: $! $?");

    while(<FILE>) {
            chomp;
            $_ = decode_utf8( $_ );
            printf( "%-15s %10s\n", $_, 'lined up' );
    }

    close( FILE );

score 4 · Accepted Answer · answered Jan 14 '10 at 15:48

4

I haven't been able to reproduce it, but loosely speaking what seems to be happening is that it's a character encoding mismatch. Most likely your Perl source file has been saved in UTF-8 encoding. However you have not enabled use utf8; in the script. So it's interpreting each of the non-ASCII German characters as being two characters and setting the padding accordingly. But the terminal you're running on is also in UTF-8 mode so the characters print correctly. Try adding use warnings; and I'll bet you get a warning printed, and I would not be surprised if adding use utf8; actually fixes the problem.

answered Jan 14 '10 at 15:48

Dan

10,990
7
51
80

1

"use warnings;" is already there, and when I add "use utf8" the third row looks like this : "M�h�ig 123456789". Reading from a file I have the same problem. – sid_com Jan 14 '10 at 16:15
4

Ok, with "binmode STDOUT, 'encoding(utf8)'" enabled too it works. – sid_com Jan 14 '10 at 16:26
@Dan: I had warnings enabled, but I didn't get any warnings. – sid_com Jan 14 '10 at 17:03

score 2 · Answer 3 · answered Sep 10 '11 at 13:46

2

You can't use Unicode with printf if you have code points that take 0 or 2 print columns instead of 1, which it appears you do.

You need to use Unicode::GCString instead.

Wrong way:

printf "%-10.10s", our $string;

Right way:

use Unicode::GCString;

my $gcstring = Unicode::GCString->new(our $string);
my $colwidth = $gcstring->columns();
if ($colwidth > 10) {
    print $gcstring->substr(0,10);
} else {
    print " " x (10 - $colwidth);
    print $gcstring;
}

answered Sep 10 '11 at 13:46

tchrist

78,834
30
123
180

This and more useful @tchrist answers at http://www.perl.com/pub/2012/05/perlunicook-unicode-column-width-for-printing.html – toddkaufmann Jul 23 '15 at 16:54

How can I properly align UTF-8 strings with Perl's printf?

3 Answers3

Linked