0

In: johnson@email.com...

I need to make sure that 'johnson' is valid by server/client standards (probably RFC 5322) for the username part of an email address. That is, Gmail and Thunderbird would accept them.

This question addresses full email addresses, which I don't need: How can I validate an email address using a regular expression?

This unpopular question is about JavaScript and doesn't have answers: Validating username part of email address

This answer to the afirst question above offers a semi-acceptable regex for a full email address, which I don't need, though it seems there might be room for improvement, but I might not need improvement: https://stackoverflow.com/a/201378/10343144

My current best solution would be to take:

emailusername="$1"
testemail="${emailusername}@nonsense.com"
regex='some existing full-email regex'
if [[ "${testemail}" =~ ${regex} ]]; then
  echo "it works"
fi

But, that doesn't address the email username part specifically, and it's more expensive than validating only the username part.

Is adding a nonsense domain to the username for a regex check the best way?

Or is there a regex that can handle only only the username part of an email address?

Jesse
  • 750
  • 1
  • 9
  • 25
  • 1
    gmail only allows letters and numbers, delimited by single dots (must start/end with letter/number, no contiguous dots). At least for new sign ups for the free tier. If that's your constraint, the regex is `[[:alnum:]]+(\.[[:alnum:]]+)*`. – dan Jun 19 '22 at 06:50
  • It you want to use the regex you linked to, just take the portion before `@`. You can use `grep -P`, or convert the regex from perl to bash, which I think can be done with the right quoting. – dan Jun 19 '22 at 07:07

2 Answers2

1

Gmail is more restrictive than the RFC in respect of the accepted usernames (see Create a username):

  • “Abuse” and “Postmaster” are reserved
  • 6–30 characters long
  • can contain letters (a-z), numbers (0-9), and periods (.)
  • cannot contain [...] more than one period (.) in a row
  • can begin or end with [...] except periods (.)
  • periods (dots) don’t matter in Gmail addresses

remark: the length of a username doesn't take the dots into account.

Then, for validating a Google username with bash you could do:

#!/bin/bash

username="$1"
username_nodots="${username//./}" 

if ! {
    (( ${#username_nodots} >=  6 )) && # this rule also excludes 'Abuse' 
    (( ${#username_nodots} <= 30 )) &&
    [[ $username =~ ^[[:alnum:]]+(\.[[:alnum:]]+)*$ ]] &&
    [[ $username != 'Postmaster' ]]
}
then
    echo "error: illegal google username: $username" >&2
    exit 1
fi

Edit: following @tripleee advice, i.e. using standard shell constructs:

username="$1"
length=$(printf %s "$username" | tr -d '.' | wc -c)

[ "$length" -ge 6 ] || {
    printf '%s\n' 'too short' >&2
    exit 1
}
[ "$length" -le 30 ] || {
    printf '%s\n' 'too long' >&2
    exit 1
}
case $username in
    *[^[:alnum:].]*)
        printf '%s\n' 'illegal character' >&2
        exit 1
    ;;
    .*)
        printf '%s\n' 'starts with dot' >&2
        exit 1
    ;;
    *.)
        printf '%s\n' 'ends with dot' >&2
        exit 1
    ;;
    *..*)
        printf '%s\n' 'multiple dots in a row' >&2
        exit 1
    ;;
    Abuse|Postmaster)
        printf '%s\n' 'reserved username' >&2
        exit 1
    ;;
esac
Fravadona
  • 13,917
  • 1
  • 23
  • 35
  • I really appreciate the head of this answer, that there is a major difference, even in what ICANN allows for domains and what browsers will actually understand. For instance, surely '@' can't be part of the username. So, the type of username that would be "deliverable to Gmail" probably makes the most sense. Your opening comments are very helpful and relevant. – Jesse Jun 19 '22 at 09:39
  • The trouble, though, with the minimum 6 char rule is that my own .NET id has a two-character username in the email. But, I like that in your answer, along with the comment as to why. It's a really good way of putting it. – Jesse Jun 19 '22 at 09:41
  • I plan to test this later in my code, then come back and select a correct answer if something works, then I'll delete this comment myself. Mods please leave this comment so users know why there is not a checked-right answer until then. ty – Jesse Jun 19 '22 at 09:42
  • As already explained in previous comments, Google is more restrictive than the RFC. Even `*@name` is a valid email address in the `name` top-level domain; that doesn't mean Google will permit `*` in a localpart. If two-letter strings are permitted, chances are they have already been reserved long ago anyway. – tripleee Jun 19 '22 at 10:36
  • I would perhaps regard a `case` statement as both simpler, more legible, and more portable. You can't use regex then, but refactoring to check some simple constraints using just glob wildcards is often quite feasible. – tripleee Jun 19 '22 at 10:39
  • @tripleee I wouldn't say that it's simpler to use a `case` statement but it surely invites to write a more granular error reporting; for example `^[[:alnum:]]+(\.[[:alnum:]]+)*$` won't differentiate between the presence of an illegal character, multiple continuous dots and a dot at the start/end of the string. – Fravadona Jun 19 '22 at 14:26
0

If your locale is C the following may work for you. It is inspired by the last regex you mention (which has not been checked against the RFC), and was not extensively tested:

atext="A-Za-z0-9!#\$%&'*+/=?^_\`{|}~-"
qs1=$'\x09\x0a\x0d\x20\x22\x5c\x80-\xff'
qs2=$'\x0a\x0d\x80-\xff'
[[ "$localpart" =~ ^([$atext]+(\.[$atext]+)*|\"([^$qs1]|\\[^$qs2])*\")$ ]] && echo "yes"
Renaud Pacalet
  • 25,260
  • 3
  • 34
  • 51