26

I have a script that runs on cron that outputs some text which we send to the 'mail' program. The general line is like this:

./command.sh | mail -s "My Subject" destination@address.com -- -F "Sender Name" -f sender@address.com

The problem is that the text generated by the script has some special characters - é, ã, ç - since it is not in english. When the e-mail is received, each character is replaced by ??.

Now I understand that this is most likely due to the encoding that is not set correctly. What is the easiest way to fix this?

JohnWithoutArms
  • 541
  • 2
  • 6
  • 5
  • My text is directly ECHO'ed from the shell script. The special characters show correctly when executed from the console. – JohnWithoutArms Jun 25 '10 at 17:51
  • The headers on the e-mail show this: Content-Transfer-Encoding: 7bit – JohnWithoutArms Jun 25 '10 at 17:52
  • I'm trying to do this: echo "maçã" | mail destination@address.com. And the result received on the e-mail is: ma????. LANG is set up as pt_BR.UTF-8, and so is LC_CTYPE. – JohnWithoutArms Jun 25 '10 at 17:56
  • I think the problem is that `mail` can't deal with UTF-8 data without some tweaking. You need to either input ISO-8859-1 characters (it should be possible to switch the terminal's character encoding), or send a UTF-8 E-Mail. – Pekka Jun 25 '10 at 17:59
  • Multiple answers here blindly suggest `mail` or `mailx` solutions which work in some versions but not others. Perhaps review https://stackoverflow.com/questions/17359/how-do-i-send-a-file-as-an-email-attachment-using-linux-command-line/48588035#48588035 – tripleee Jan 05 '22 at 07:27

9 Answers9

26

My /usr/bin/mail is symlinked to /etc/alternatives/mail which is also symlinked to /usr/bin/bsd-mailx

I had to specify myself the encoding in the mail header. (The -S is not supported here.)

cat myutf8-file | mail -a "Content-Type: text/plain; charset=UTF-8" -s "My Subject" me@mail.com

John Conde
  • 217,595
  • 99
  • 455
  • 496
KumZ
  • 565
  • 12
  • 20
  • 5
    I get the following error running that command error: `"Content-Type: text/plain; charset=UTF-8: No such file or directory`. After consulting the manpage for `mail`, it appears that the `-a` option is supposed to be used for specifying an attachment. – Nathan Osman Mar 01 '13 at 04:25
  • 2
    @GeorgeEdison This may be distribution specific. On Ubuntu 12.04 `-a` defines additional header fields (using the package `bsd-mailx`). Also, the man page states: _"The mailx utility is compliant with the IEEE Std 1003.1-2008 (“POSIX.1”) specification. The flags [-abcdeEIv] are **extensions** to that specification."_ So, `-a` is not defined in the POSIX.1 specification. – Sebastian Krysmanski Apr 03 '13 at 13:56
  • Setting LC_CTYPE environment variable is far much cleaner. In your case you force mail to trust you. Setting LC_CTYPE make mail aware of what your are sending to him and let him forge a correct header. – Julien Palard May 30 '13 at 15:28
  • 3
    That works for the mail body, thanks. More generally, people need UTF-8 encoding in headers, too, as per http://tools.ietf.org/html/rfc2047 and newer, which this solution does not cover. – Stéphane Gourichon Sep 24 '13 at 09:32
  • I was forced to use `"Content-Type: text/html; charset=UTF-8"` to have consistent results with Thunderbird on Ubuntu. – godzillante Sep 12 '19 at 17:42
11

You're right in assuming this is a charset issue. You need to set the appropriate environment variables to the beginning of your crontab.

Something like this should work:

LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8

Optionally use LC_ALL in place of LC_CTYPE.

Reference: http://opengroup.org/onlinepubs/007908799/xbd/envvar.html

Edit: The reason it displays fine when you run it in your shell is probably because the above env vars are set in your shell.

To verify, execute 'locale' in your shell, then compare to the output of a cronjob that runs the same command.

Re-Edit: Ok, so it's not an env var problem.

I am assuming you're using mailx, as it is the most common nowdays. It's manpage says:

The character set for outgoing messages is not necessarily the same as the one used on the terminal. If an outgoing text message contains characters not representable in US-ASCII, the character set being used must be declared within its header. Permissible values can be declared using the sendcharsets variable,

So, try and add the following arguments when calling mail:

-S sendcharsets=utf-8,iso-8859-1
Casey
  • 6,166
  • 3
  • 35
  • 42
  • This will get my upvote if it works. I'm not entirely sure whether it will in this case though, as the offending characters are probably already in UTF-8 format (having been entered manually) and `mail` will hardly be able to deal with them either way? But maybe I'm overlooking something. We will see. – Pekka Jun 25 '10 at 17:54
  • I have checked and both LANG and LC_CTYPE environment variables are set up as you suggested already. – JohnWithoutArms Jun 25 '10 at 17:55
  • Interesting. On my system, /usr/bin/mail is a symlink to /usr/bin/mailx, the manpage for which says: The character set for outgoing messages is not necessarily the same as the one used on the terminal. If an outgoing text message contains characters not representable in US-ASCII, the character set being used must be declared within its header. Permissible values can be declared using the sendcharsets variable. – Casey Jun 25 '10 at 18:01
  • I added "locale" to the beginning of the script and ran "scriptName.sh" | mail destination@address.com. All lang variables are set as the result in the console (pt_BR.UTF-8) but the characters are still changed into ??. – JohnWithoutArms Jun 25 '10 at 18:03
  • I checked "man mail" on my system and it says nothing regarding character encoding or charsets, unfortunately. I'll try setting up the charset as a header and get back to you. – JohnWithoutArms Jun 25 '10 at 18:05
  • It seems I AM using mailx, as "man mailx" returns the same page as "man mail" and the command itself behaves the same. – JohnWithoutArms Jun 25 '10 at 18:22
  • Well, it seems that my version of mailx does not offer support for the -S switch. And it is the "latest" version supported on my 8.04 Ubuntu Server. I guess I'll have to try and remove the special characters from my scripts. – JohnWithoutArms Jun 25 '10 at 18:32
7

Just to give additional information to KumZ answer: if you need to specify more headers with the -a switch, feel free to add them up, like this (note the polyusage of -a).

echo /path/to/file | mail -s "Some subject" recipient@theirdomain.com -a "From: Human Name <noreply@mydomain.com>" -a "Content-Type: text/plain; charset=UTF-8"
Fabien Haddadi
  • 1,814
  • 17
  • 22
6

i've written a bash function to send an email to recipients. The function send utf-8 encoded mails and work with utf-8 chars in subject and content by doing a base64 encode.

To send a plain text email:

send_email "plain" "from@domain.com" "subject" "contents" "to@domain.com" "to2@domain.com" "to3@domain.com" ...

To send a HTML email:

send_email "html" "from@domain.com" "subject" "contents" "to@domain.com" "to2@domain.com" "to3@domain.com" ...

Here is the function code.

# Send a email to recipients.
#
# @param string $content_type Email content mime type: 'html' or 'plain'.
# @param string $from_address Sender email.
# @param string $subject Email subject.
# @param string $contents Email contents.
# @param array $recipients Email recipients.
function send_email() {
  [[ ${#} -lt 5 ]] && exit 1

  local content_type="${1}"
  local from_address="${2}"
  local subject="${3}"
  local contents="${4}"

  # Remove all args but recipients.
  shift 4

  local encoded_contents="$(base64 <<< "${contents}")"
  local encoded_subject="=?utf-8?B?$(base64 --wrap=0 <<< "${subject}")?="

  for recipient in ${@}; do
    if [[ -n "${recipient}" ]]; then
    sendmail -f "${from_address}" "${recipient}" \
        <<< "Subject: ${encoded_subject}
MIME-Version: 1.0
From: ${from_address}
To: ${recipient}
Content-Type: text/${content_type}; charset=\"utf-8\"
Content-Transfer-Encoding: base64
Content-Disposition: inline

${encoded_contents}"
    fi
  done

  return 0
} # send_message()
Biapy
  • 349
  • 3
  • 9
3

You may use sendmail command directly without mail wrapper/helper.
It would allow you to generate all headers required for "raw" UTF-8 body
(UTF-8 is mentioned in asker's comments),

WARNING-1:
Non 7bit/ASCII characters in headers (e.g. Subject:,From:,To:) require special encoding
WARNING-2:
sendmail may break long lines (>990 bytes).

SENDER_ADDR=sender@address.com
SENDER_NAME="Sender Name"
RECIPIENT_ADDR=destination@address.com
(
# BEGIN of mail generation chain of commands
# "HERE" document with all headers and headers-body separator
cat << END
Subject: My Subject
From: $SENDER_NAME <$SENDER_ADDR>
To: $RECIPIENT_ADDR
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

END
# custom script to generate email body
./command.sh
# END   of mail generation chain of commands
) | /usr/sbin/sendmail -i -f$SENDER_ADDR -F"$SENDER_NAME" $RECIPIENT_ADDR
AnFi
  • 10,493
  • 3
  • 23
  • 47
2

rfc2045 - (5) (Soft Line Breaks) The Quoted-Printable encoding REQUIRES that encoded lines be no more than 76 characters long. For bash shell script code:

#!/bin/bash
subject_encoder(){
  echo -n "$1" | xxd -ps -c3 |awk -Wposix 'BEGIN{
    BASE64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    printf " =?UTF-8?B?"; bli=8
  }
  function encodeblock (strin){
    b1=sprintf("%d","0x" substr(strin,1,2))
    b2=sprintf("%d","0x" substr(strin,3,2))
    b3=sprintf("%d","0x" substr(strin,5,2))
    o=substr(BASE64,b1/4 + 1,1) substr(BASE64,(b1%4)*16 + b2/16 + 1,1)
    len=length(strin)
    if(len>1) o=o substr(BASE64,(b2%16)*4 + b3/64 + 1,1); else o=o"="
    if(len>2) o=o substr(BASE64,b3%64 +1 ,1); else o=o"="
    return o
  }{
    bs=encodeblock($0)
    bl=length(bs)
    if((bl+bli)>64){
      printf "?=\n =?UTF-8?B?"
      bli=bl
    }
    printf bs
    bli+=bl
  }END{
    printf "?=\n"
  }'
}
SUBJECT="Relatório de utilização"
SUBJECT=`subject_encoder "${SUBJECT}"`
echo '<html>test</html>'| mail -a "Subject:${SUBJECT}" -a "MIME-Version: 1.0" -a "Content-Type: text/html; charset=UTF-8" you@domain.net
  • 1
    Hard-coding your own ad-hoc MIME encoder into each script separately is not very sustainable. The quick and dirty fix is to install e.g. `mutt` which gives you reasonable command-line control over what is being sent and how. – tripleee Jan 05 '22 at 07:29
0

This is probably not a command line issue, but a character set problem. Usually when sending E-Mails, the character set will be iso-8859-1. Most likely the text you are putting into the process is not iso-8859-1 encoded. Check out what the encoding is of whatever data source you are getting the text from.

Obligatory "good reading" link: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Re your update: In that case, if you enter the special characters manually, your terminal may be using UTF-8 encoding. You should be able to convert the file's character set using iconv for example. The alternative would be to tell mail to use UTF-8 encoding, but IIRC that is not entirely trivial.

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
0

use the option -o message-charset="utf-8", like that:

sendemail -f your_email -t destination_email -o message-charset="utf-8" -u "Subject" -m "Message" -s smtp-mail.outlook.com:587 -xu your_mail -xp your_password
Jeff Learman
  • 2,914
  • 1
  • 22
  • 31
0

I'm a bit late but none of the previous solutions worked for me.

Locating mail command (CentOS)

# locate mail | grep -v www | grep -v yum | grep -v share
# ls -l /bin/mail
lrwxrwxrwx. 1 root root 22 jul 21  2016 /bin/mail -> /etc/alternatives/mail
# ls -l /etc/alternatives/mail
lrwxrwxrwx. 1 root root 10 jul 21  2016 /etc/alternatives/mail -> /bin/mailx
# ls -l /bin/mailx
-rwxr-xr-x. 1 root root 390744 dic 16  2014 /bin/mailx

So mail command is in fact mailx. This helped with the search that finally took me to this answer at Unix&Linux Stackexchange that states:

Mailx expects input text to be in Unix format, with lines separated by newline (^J, \n) characters only. Non-Unix text files that use carriage return (^M, \r) characters in addition will be treated as binary data; to send such files as text, strip these characters e. g. by tr -d '\015'

From man page and:

If there are other control characters in the file they will result on mailx treating the data as binary and will then attach it instead of using it as the body. The following will strip all special characters and place the contents of the file into the message body

So the solution is using tr command to remove those special characters. Something like this:

./command.sh \
| tr -cd "[:print:]\n" \
| mail -s "My Subject" destination@address.com -- -F "Sender Name" -f sender@address.com

I've used this solution with my command

grep -v "pattern" $file \
| grep -v "another pattern" \
| ... several greps more ... \
| tr -cd "[:print:]\n" \
| mail -s "$subject" -a $file -r '$sender' $destination_email
EAmez
  • 837
  • 1
  • 9
  • 25