8

I'm trying to make a Bash script to check if an email address is correct.

I have this regular expression:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

Source: http://www.regular-expressions.info/email.html

And this is my bash script:

regex=[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

i="test@terra.es"
if [[ $i=~$regex ]] ; then
    echo "OK"
else
    echo "not OK"
fi

The script fails and give me this output:

10: Syntax error: EOF in backquote substitution

Any clue??

Mateusz Piotrowski
  • 8,029
  • 10
  • 53
  • 79
ballstud
  • 81
  • 1
  • 1
  • 2
  • Are you aware of internationalized domain names http://www.icann.org/en/topics/idn/ ? does your regexp match test@fõõ.bâr.com ? – Jean Jan 26 '10 at 10:52
  • If you read that article you quoted thoroughly, you'll see that a) regexes will only help you to sort out blatantly illegal addresses, b) you'll either have false positives and false negatives or a completely unwieldy regex, and c) in the end, you'll have to actually send an email to that address to check whether it is not only syntactically valid but in fact correct (which no regex can tell you). – Tim Pietzcker Jan 26 '10 at 12:27
  • check out this post: http://solidlystated.com/scripting/proper-email-address-validation/ – Nam Nguyen Oct 30 '13 at 18:33
  • related http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address – Ciro Santilli OurBigBook.com May 16 '16 at 17:00

8 Answers8

14

You have several problems here:

  • The regular expression needs to be quoted and special characters escaped.
  • The regular expression ought to be anchored (^ and $).
  • ?: is not supported and needs to be removed.
  • You need spaces around the =~ operator.

Final product:

regex="^[a-z0-9!#\$%&'*+/=?^_\`{|}~-]+(\.[a-z0-9!#$%&'*+/=?^_\`{|}~-]+)*@([a-z0-9]([a-z0-9-]*[a-z0-9])?\.)+[a-z0-9]([a-z0-9-]*[a-z0-9])?\$"

i="test@terra.es"
if [[ $i =~ $regex ]] ; then
    echo "OK"
else
    echo "not OK"
fi
Peter Eisentraut
  • 35,221
  • 12
  • 85
  • 90
10

You don't have to create such a complicated regex to check valid email. You can simply split on "@", then check whether there are 2 items, one that is in front of the @, and the other at the back.

i="test@terraes"
IFS="@"
set -- $i
if [ "${#@}" -ne 2 ];then
    echo "invalid email"
fi
domain="$2"
dig $domain | grep "ANSWER: 0" 1>/dev/null && echo "domain not ok"

To check the domain further, you can use tools like dig to query the domain. It is better than regex because @new.jersey gets matched by regex but its actually not a proper domain.

Julie Pelletier
  • 1,740
  • 1
  • 10
  • 18
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • 2
    I actually think this is a much saner approach. think of all the website that disallow : test+marker@domain.com even though this is a perfectly valid email. it should get rid of most fakes and still be ok. you could make it a bit stronger by checking the presence of a '.' in the second element and making sure it separates the second element in 2 subelements. Think international domains for exemple – Jean Jan 26 '10 at 10:48
  • @Jean: the second element also may contain more than two substrings separated with a dot, so this is just fine, although you could want to allow mails ligḱe `user@localhost` as well in some cases – rubo77 Aug 10 '16 at 14:30
5

Quotes, backticks and others are special characters in shell scripts and need to be escaped if they are used like in the assignment of regex. You can escape special characters with backslashes, or use single quotes around the regex if you leave out the single quote used in it.

I would recommend to use a simpler regular expression like .*@.* because all the complexity is futile. foo@example.com looks perfectly fine and will be accepted by any regular expression, but still it doesn't exist.

sth
  • 222,467
  • 53
  • 283
  • 367
1

Comming late for the party, but I adapted a script to read a file containing emails and filtering it using RFC822 regex, domain typo lists, mx lookup (thanks to eagle1 here) and ambiguous email filtering.

The script can be used like:

./emailCheck.sh /path/to/emailList

and produces two files, the filtered list and the ambiguous list. Both are already cleared from non RFC822 compliant adresses, email domains that don't have valid MX domains, and domain typos.

Script can be found here: https://github.com/deajan/linuxscripts

Corrections and comments are welcome :)

Orsiris de Jong
  • 2,819
  • 1
  • 26
  • 48
1

Bash version less than 3.2:

if [[ "$email" =~ "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$" ]]
then
    echo "Email address $email is valid."
else
    echo "Email address $email is invalid."
fi

Bash version greater than or equal to 3.2:

if [[ "$email" =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$ ]]
then
    echo "Email address $email is valid."
else
    echo "Email address $email is invalid."
fi

The reasons why you shouldn't use a very specific regex, like you have, are explained here.

Community
  • 1
  • 1
rouble
  • 16,364
  • 16
  • 107
  • 102
  • this will fail for email=abc@yoyo.com using bash shell – Nam Nguyen Oct 22 '13 at 20:32
  • In version 3.2 of bash they changed how regexs work. To keep it short, you do not want the quotes on the regex portion of the condition. For your reference http://stackoverflow.com/questions/218156/bash-regex-with-quotes – rouble Nov 06 '13 at 05:49
0

The immediate problem with your script is you need to fix the quoting:

regex='[a-z0-9!#$%&'"'"'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'"'"'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?'

However, this regular expression does not accept all syntactically valid email addresses. Even if it did, not all syntactically valid email addresses are deliverable.

If deliverable addresses are what you care about, then don't bother with a regular expression or other means of checking syntax: send a challenge to the address that the user supplies. Be careful not to use untrusted input as part of a command invocation! With sendmail, run sendmail -oi -t and write a message to the standard input of the sendmail process, e.g.,

To: test@terra.es.invalid
From: no-reply@your.organization.invalid
Subject: email address confirmation

To confirm your address, please visit the following link:

http://www.your.organization.invalid/verify/1a456fadef213443
Greg Bacon
  • 134,834
  • 32
  • 188
  • 245
0

I've adjusted the above examples to have a unique function that will check for the validity of the address with the regexp and if the domain actual exist with dig, otherwise return an error.

#!/bin/bash
#Regexp
regex="^[a-z0-9!#\$%&'*+/=?^_\`{|}~-]+(\.[a-z0-9!#$%&'*+/=?^_\`{|}~-]+)*@([a-z0-9]([a-z0-9-]*[a-z0-9])?\.)+[a-z0-9]([a-z0-9-]*[a-z0-9])?\$"

#Vars
checkdig=0;
checkreg=0;
address=$1;
maildomain=`echo $address | awk 'BEGIN { FS = "@" } ; { print $2 }'`;

#Domain Check
checkdns() {
        echo $maildomain | awk 'BEGIN { FS = "@" } ; { print $2 }' | xargs dig $maildomain | grep "ANSWER: 0" 1>/dev/null  || checkdig=1;
}

#Regexp
checkreg() {
        if [[ $address =~ $regex ]] ;
                then checkreg=1;
        fi
}

#Execute
checkreg;
checkdns;

#Results
if [ $checkreg == 1 ] && [ $checkdig == 1 ];
        then    echo "OK";
        else    echo "not OK";
fi
#End

Nothing special.

eagle1
  • 61
  • 2
0

In a moment of madness once, I wrote this Perl subroutine based on the Mastering Regular Expressions book:

sub getRFC822AddressSpec
{
    my ($esc, $space, $tab, $period) = ('\\\\', '\040', '\t', '\.');
    my ($lBr, $rBr, $lPa, $rPa)      = ('\[', '\]', '\(', '\)');
    my ($nonAscii, $ctrl, $CRlist)   = ('\200-\377', '\000-\037', '\n\015');

    my $qtext       = qq{ [^$esc$nonAscii$CRlist] }; # within "..."
    my $dtext       = qq{ [^$esc$nonAscii$CRlist$lBr$rBr] }; # within [...]
    my $ctext       = qq{ [^$esc$nonAscii$CRlist()] }; # within (...)
    my $quoted_pair = qq{ $esc [^$nonAscii] }; # an escaped char
    my $atom_char   = qq{ [^()$space<>\@,;:".$esc$lBr$rBr$ctrl$nonAscii] };
    my $atom        = qq{ $atom_char+     # some atom chars
                          (?!$atom_char)  # NOT followed by part of an atom
                        };
    # rfc822 comments are (enclosed (in parentheses) like this)
    my $cNested     = qq{ $lPa (?: $ctext | $quoted_pair )* $rPa };
    my $comment     = qq{ $lPa (?: $ctext | $quoted_pair | $cNested )* $rPa };

    # whitespace and comments may be scattered liberally
    my $X           = qq{ (?: [$space$tab] | $comment )* };

    my $quoted_str  = qq{ " (?: $qtext | $quoted_pair )* " };
    my $word        = qq{ (?: $atom | $quoted_str ) };
    my $domain_ref  = $atom;
    my $domain_lit  = qq{ $lBr (?: $dtext | $quoted_pair )* $rBr };
    my $sub_domain  = qq{ (?: $domain_ref | $domain_lit ) };
    my $domain      = qq{ $sub_domain (?: $X $period $X $sub_domain )* };
    my $local_part  = qq{ $word (?: $X $period $X $word )* };
    my $addr_spec   = qq{ $local_part $X \@ $X $domain };

    # return a regular expression object
    return qr{$addr_spec}ox;
}

my $spec = getRFC822AddressSpec();
my $address = q{foo (Mr. John Foo) @ bar. example};
print "$address is an email address" if ($address =~ qr{$spec});
glenn jackman
  • 238,783
  • 38
  • 220
  • 352