3

I need advice for imap folder encoding.

I am created by my mail client (Thunderbird) imap folder with russian symbols.

Folder name is - Проверка

Folder name on filesystem is - user.mylogin.&BB8EQAQ+BDIENQRABDoEMA-

I wrote this code for convert (perl v5.10.1)

use strict;
use warnings;
use utf8;
use Encode::IMAPUTF7;

my $folder=$ARGV[1];

binmode(STDOUT,':utf8');

if ($ARGV[0] eq 'to')
    { print Encode::IMAPUTF7::encode('IMAP-UTF-7', $folder) }
    elsif ($ARGV[0] eq 'from')
    { print Encode::IMAPUTF7::decode('IMAP-UTF-7', $folder) }
print "\n";

Try convert folder name to russian

[w@pandora6 tmp]$ ./imapfolder.pl from '&BB8EQAQ+BDIENQRABDoEMA-'
Проверка

All work fine

Try reverse convert

[w@pandora6 tmp]$ ./imapfolder.pl to Проверка
&ANAAnwDRAIAA0AC+ANAAsgDQALUA0QCAANAAugDQALA-

Hmm.. i am expect &BB8EQAQ+BDIENQRABDoEMA-

Ok, encode back

[w@pandora6 tmp]$ ./imapfolder.pl from '&ANAAnwDRAIAA0AC+ANAAsgDQALUA0QCAANAAugDQALA-'
ÐÑовеÑка

WTF? I expected Проверка

What went wrong?

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Anton Shevtsov
  • 1,279
  • 4
  • 16
  • 34

2 Answers2

8

You have been caught by one of the many gotchas of Unicode in Perl. use utf8 only turns on UTF-8 syntax. That means things like constant strings, variable names and function names will be in UTF-8. Everything else will not. Specifically the strings in @ARGV will not be UTF-8. Those will still be plain bytes.

Fortunately there is a simple fix. Use utf8::all. This will turn on all of the UTF-8 features you'd expect use utf8 to do.

  • Makes @ARGV encoded in UTF-8 (when utf8::all is used from the main package).

  • Filehandles are opened with UTF-8 encoding turned on by default (including STDIN, STDOUT, STDERR). If you don't want UTF-8 for a particular filehandle, you'll have to set binmode $filehandle.

  • charnames are imported so \N{...} sequences can be used to compile Unicode characters based on names.

  • readdir now returns UTF-8 characters instead of bytes.

  • glob and the <> operator now return UTF-8 characters instead of bytes.

Your code is reduced to...

use strict;
use warnings;
use utf8::all;
use Encode::IMAPUTF7;

my $folder=$ARGV[1];

if ($ARGV[0] eq 'to') {
    print Encode::IMAPUTF7::encode('IMAP-UTF-7', $folder)
}
elsif ($ARGV[0] eq 'from') {
    print Encode::IMAPUTF7::decode('IMAP-UTF-7', $folder)
}
print "\n";
Community
  • 1
  • 1
Schwern
  • 153,029
  • 25
  • 195
  • 336
0

If you don't have utf8::all installed and just want a quick one-liner, you can also use Perl's -C option to make it do everything in UTF8.

Example:

$ utf7=$(perl -CSA -MEncode::IMAPUTF7 -le 'print Encode::IMAPUTF7::encode("IMAP-UTF-7", shift)' "Проверка")
$ echo "$utf7"
&BB8EQAQ+BDIENQRABDoEMA-

$ perl -CSA -MEncode::IMAPUTF7 -le 'print Encode::IMAPUTF7::decode("IMAP-UTF-7", shift)' "$utf7"
Проверка
mivk
  • 13,452
  • 5
  • 76
  • 69