14

Set out to write a simple procmail recipie that would forward the mail if it found the text "ABC Store: New Order" in the subject.

 :0
    * ^(To|From).*abc@cdefgh.com
    * ^Subject:.*ABC Store: New Order*
    {

Unfortunately the subject field in the mail message coming from the mail server was in MIME encoded-word syntax.

Subject: =?UTF-8?B?QUJDIFN0b3JlOiBOZXcgT3JkZXI=?=

The above subject is utf-8 ISO-8859-1 charset, So was wondering if there are any mechanisms/scripts/utilities to parse this and convert to string format so that I could apply my procmail filter.

tripleee
  • 175,061
  • 34
  • 275
  • 318
MON
  • 159
  • 1
  • 5
  • What you are looking at is a RFC2047-encoded header. Like it says in the charset part, it is in UTF-8, base64-encoded. There is no ISO-8859-1 here (that's a different encoding; it can't be in ISO-8859-1 aka Latin-1 if it's in UTF-8). – tripleee Apr 20 '15 at 08:00
  • In the general case, the repertoire of UTF-8 is much larger than the repertoire of ISO-8859-1, so you will not always be able to translate UTF-8 to ISO-8859-1. If you only care about unwrapping the RFC2047 encoding and recovering the UTF-8 text, that's always possible (and perhaps a better thing to do). – tripleee Apr 20 '15 at 08:03

2 Answers2

20

You may use perl one liner to decode Subject: before assigment to procmail variable.

# Store "may be encoded" Subject: into $SUBJECT after conversion to ISO-8859-1
:0 h
* ^Subject:.*=\?
SUBJECT=| formail -cXSubject: | perl -MEncode=from_to -pe 'from_to $_, "MIME-Header", "iso-8859-1"'

# Store all remaining cases of Subject: into $SUBJECT
:0 hE
SUBJECT=| formail -cXSubject:

# trigger recipe based also on $SUBJECT content
:0
* ^(To|From).*abc@cdefgh.com
* SUBJECT ?? ^Subject:.*ABC Store: New Order
{
....
}

Comment (2020-03-07): It may be better to convert to UTF-8 charset instead of ISO-8859-*.

AnFi
  • 10,493
  • 3
  • 23
  • 47
  • 1
    Nice. I had no idea that `MIME-Header` was an available encoding – Borodin Apr 18 '15 at 16:12
  • 1
    Though the `r*` in the regex `New Order*` is kind of silly, and arguably wrong. – tripleee Apr 20 '15 at 04:52
  • Why is the command for the "remaining cases" like this: `SUBJECT=| formail -cXSubject` **without a colon**, unlike the command for the first case: `SUBJECT=| formail -cXSubject: |`? – imz -- Ivan Zakharyaschev Mar 26 '17 at 16:15
  • I have fixed example to syntax as in `man formail` examples. Basic test of ` formail -cXSubject` seem to produce correct results too. – AnFi Mar 26 '17 at 23:52
  • 1
    The argument to `formail -x` is just a string prefix; without the colon you will extract every header which *starts* with `Subject`; of course, in practice, unless you are running a fuzz tester or something, only `Subject:` will actually match. – tripleee Sep 03 '20 at 09:35
1

You should use MIME::EncWords.

Like this

use strict;
use warnings;
use 5.010;

use MIME::EncWords 'decode_mimewords';

my $subject = '=?UTF-8?B?QUJDIFN0b3JlOiBOZXcgT3JkZXI=?=';
my $decoded = decode_mimewords($subject);
say $decoded;

output

ABC Store: New Order
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • This only unwraps the RFC2047 encoding; the result is still in UTF-8. Because the OP's regex doesn't contain any characters where the encoding differs between ISO-8859-1 and UTF-8, it doesn't seem to matter; but if you want to match text which is not pure ASCII, the encoding does matter, and you should know which encoding you are using. (Like I argue in another comment, I would actually suggest to keep everything in UTF-8; but that is perhaps not what the OP is requesting. Though the question is unclear on this part.) – tripleee Apr 20 '15 at 08:05