0

I've got the following line in my .procmailrc on SMTP server:

BODY=`formail -I ""`

Later I echo this body to a local file:

echo "$BODY" >> $HOME/$FILENAME; \

I've also tried prinf (but I got the same effect):

printf "$BODY" >> $HOME/$FILENAME; \

When I read this file I can see that encoding has been change. Here's what I got:

Administrator System=C3=B3w

while it should be (in Polish):

Administrator Systemów

How to decode/encode the body either directly in .procmailrc or later (bash/python) to get the right string?

Another line in my .procmailrc works properly but it needs additional pipe with perl encoder:

SUBJECT=`formail -xSubject: | tr -d '\n' | sed -e 's/^ //' | /usr/bin/perl -MEncode -ne 'print encode ("utf8",decode ("MIME-Header",$_ )) '`

SUBJECT contains UTF8 characters and everything looks OK. Maybe there's a way to use a similar solution with the body of the mail?

OK. I finally got everything up and running. Here's what I did:

First the .procmailrc file:

VERBOSE=yes
LOGFILE=$HOME/procmail.log
:0f
* ^From.*(some_address@somedomain.com)
| $HOME/python_script.py

Now to the python_script.py:

#!/usr/bin/python

from email.parser import Parser
import sys

temp_file = open("/home/(user)/file.txt","w")
temp_file.write("START\n")

if not message.is_multipart():
        temp_file.write(message.get_payload(decode=True))
else:
        for part in message.get_payload():
                if part.get_content_type() == 'text/plain':
                        temp_file.write(part.get_payload(decode=True))

temp_file.close()

The most difficult part to debug was the .procmailrc recipe, where I had to test many options for :0, :0f, :0fbW etc... and finally found the one that suits best.

The next problematic step was the $BODY part decoded directly in .procmailrc. I figured out the solution though, by getting rid of all the stuff and moving everything to Python script. Just as tripleee suggested.

Jaro
  • 1,232
  • 12
  • 12
  • I think this is more likely a bash thing than python. what version is your bash? – Anzel Oct 16 '14 at 11:42
  • GNU bash, version 4.2.37(1) – Jaro Oct 16 '14 at 11:45
  • Does `echo "$BODY"` return correct unicode? – Anzel Oct 16 '14 at 11:48
  • No. The "" quotation marks only keep the \n characters exactly where they should be, but it looks like echo is loosing the original utf-8 encoding of the mail message. – Jaro Oct 16 '14 at 11:55
  • i believe you're using a non utf-8 terminal, i have posted an answer hopefully will solve your problem – Anzel Oct 16 '14 at 12:11
  • `:0f` is incorrect if you are not writing a (possibly modified) message to stdout. Lose the `f` flag. If you want processing to continue even if the recipe succeeds, add a `c` flag instead. – tripleee Oct 22 '14 at 14:22
  • Please don't update your question. Instead, post an answer of your own. – tripleee Oct 22 '14 at 14:47
  • :0f works fine for me, because I do print the message to stdout. This specific functionality is provided directly in the script. – Jaro Oct 23 '14 at 06:35
  • The script you posted is not printing anything at all. I think you will find an error message in your Procmail log "rescue of unfiltered data succeeded". – tripleee Oct 23 '14 at 06:52
  • Yes, my bad... I intentionally deleted those lines from the above code, as they are irrelevant to the question. – Jaro Oct 23 '14 at 07:34

2 Answers2

1

It is not changed, but you are zapping the headers so that the correct Content-Type: header is no longer present (you should also keep Mime-Version: and any other standard Content-* headers).

You should see, by examining the source of the message in your mail client, that Procmail or Bash have actually not changed anything. The text you receive is in fact literally Administrator System=C3=B3w but the MIME headers inform your email client that this is Content-Transfer-Encoding: quoted-printable and Content-type: text/plain; charset="utf-8" and so it knows how to decode and display this correctly.

If you want just the payload, you will need to decode it yourself, but in order to do that, you need this information from the MIME headers, so you should not kill them before you have handled the message (if at all). Something like this, perhaps:

from email.parser import Parser
import sys

message = Parser().parse(sys.stdin)
if message['content-type'].lower().startswith('text/'):
    print(message.get_payload(decode=True))
else:
    raise DieScreamingInAnguish('aaaargh!')  # pseudo-pseudocode

This is extremely simplistic in that it assumes (like your current, even more broken solution) that the message contains a single, textual part. Extending it to multipart messages is not technically hard, but how exactly you do that depends on what sort of multiparts you expect to receive, and what you want to do with the payload(s).

Like in your previous question I would like to suggest that you move more, or all, of your email manipulation into Python, if you are going to be using it anyway. Procmail has no explicit MIME support so you would have to reinvent all of that in Procmail, which is neither simple nor particularly fruitful.

Community
  • 1
  • 1
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Thank you for your suggestions! They were very usefull. You pointed the right direction. I updated my question with the solution I found. – Jaro Oct 22 '14 at 13:06
0

I think it could be your echo doesn't return correct unicode to write to your file in the first place, here are 2 of many solutions that may help you:

to echo with escape character:

echo -e "$BODY" >> $HOME/$FILENAME; \

or, use iconv or similar to encode your file to utf-8, assuming you have iconv in linux

iconv -t UTF-8 original.txt > encoded_result.txt
Anzel
  • 19,825
  • 5
  • 51
  • 52
  • Doesn't work. I think it's related to the way formail treats input. I updated the question in this regard. – Jaro Oct 16 '14 at 12:31
  • alright, perhaps file conversion to utf-8 encoded is one solution. I'm not familiar with perl encoder, so I have to leave this to others to help you. good luck! – Anzel Oct 16 '14 at 12:38