Validating input to follow a specific order

Question

I have to make sure that an email address that is inputted is valid. An email address must:

Start with a string of alphanumeric characters
followed by the @ symbol
another string of alphanumeric characters
followed by a .
then a string of alphanumeric characters.

For example, a@b.c and ab23@f45.d3 are both valid, but @bc.d and 123.c@cvb are not valid. How would I program something that would make sure the input follows this order?

just build a regex to validate it, keep asking for input until it sattisfies the regex — Netwave, Jul 11 '16 at 08:21
use Google `python regex email` and you find: http://stackoverflow.com/questions/8022530/python-check-for-valid-email-address — Jens, Jul 11 '16 at 08:22
Note that your current definition does not actually match all valid email addresses. Also you apparently haven't written anything, so maybe start there. — jonrsharpe, Jul 11 '16 at 08:23
if you wanna be serious about that, you have to implement that: __https://en.wikipedia.org/wiki/Email_address__ See the Syntax paragraph — Ma0, Jul 11 '16 at 08:59
Why, yes you're right. I haven't written anything because I don't know how to do it, which is why I posted it on here? Damn, this community is vicious. — Pelican, Jul 11 '16 at 19:43

score 2 · Answer 1 · answered Jul 11 '16 at 08:37

2

Use Regular Expression

An example:

import re
if re.search(r'[\w.-]+@[\w.-]+.\w+', email):
    do stuff
else
    do other stuff

answered Jul 11 '16 at 08:37

MCSH

405
4
16

Thanks, this works, but I've cut the search down to this: re.search(r'[\w]@[\w].[\w]',email) and it still works. It seems like a shorter way to write it, is it still as good? – Pelican Jul 11 '16 at 20:24
1

@MCSH: For matching the OP's restricted requirements the suggested pattern is wrong, It accepts periods and hypens in the first and second segments so email addresses such as `a....---.@b....c.d.a.s` would be accepted. `\w` also accepts the underscore character `_`, and potentially others when `re.UNICODE` is set. Furthermore `re.search()` will match anywhere within the given string, so `#*&^812a....---.@b....c.d.a.s()()*&` will match. Use `^` and `$` at the beginning and ends of the pattern to match from the start to the end of the address, or use `re.match()` with the pattern ending in `$`. – mhawke Jul 12 '16 at 09:45
@Pelican: You need to specify repetition within the pattern, i.e. `[\w]+` where the `+` says "one or more" of the previous pattern. `\w` also matches underscore so you can't use that; use `[a-zA-Z\d]+` instead. The `.` needs to be escaped so that it is considered a literal `.` and not match any single character. Finally, anchor the search with `^` and `$` so that extraneous characters before and after the address don't cause a match. The pattern should be: `r'^[a-zA-Z\d]+@[a-zA-Z\d]+\.[a-zA-Z\d]+$'`. See [my answer](http://stackoverflow.com/a/38318090/21945) for code, and non-regex alternative – mhawke Jul 12 '16 at 09:55

Ted Klein Bergman · Answer 2 · 2016-07-11T09:22:02.527

I would split the string at the @ character into two new strings and check if the string to the left of @ only contains alphanumeric characters. Then I would split the string on the right at the . character and check if both left and right string contain only alphanumeric characters.

def test(email):
    left, right = email.split('@')  # 'abc123@mail1.com' -> left = 'abc123', right = 'mail1.com'
    if not left.isalnum():  # If characters to the left of '@' is not alphanumeric or is empty, return False.
        return False

    left, rest = right.split('.')  # 'mail1.com' -> left = 'mail1, right = 'com'
    if not (left.isalnum() and rest.isalnum()):  # If characters to the left and right of '.' is not alphanumeric or is empty, return False.
        return False

    return True  # If it haven't returned False, return True.


# To test if it works as you intended. It works for the cases you provided.
while True:
    print(test(input('Email: ')))

Ma0 · Answer 3 · 2016-07-12T08:16:23.290

This is my take on this:

def email_valid(email_str):
    if email_str.count('@') != 1 or email_str.count('.') != 1:
        return False
    if len(min(email_str.split('@'))) == 0 or len(min(email_str.split('@')[1].split('.'))) == 0:
        return False
    parts = email_str.split('@')[1].split('.') + [email_str.split('@')[0]]
    return True if all(x.isalnum() for x in parts) else False

check = False
while not check:
    check = email_valid(input('Please provide an email:\t'))

print('Email accepted!')

Checks to make sure '@', '.' can be found exactly once in the provided string and the string parts before and after them are alphanumeric & non empty.

However, the rules implemented here are not the rules generally applied to email accounts. For a list of those, see the syntax paragraph of this article.

I think that you meant `all()`, not `any()`. The checks that at least one part is alphanumeric, but other parts could contain invalid characters, e.g. `123.c@cvb` is considered valid, as is `ab_()@x.com` which is clearly invalid. — mhawke, Jul 12 '16 at 00:43

mhawke · Answer 4 · 2016-07-12T10:08:17.617

Here's another non-regex way to do it:

def validate_email(email):
    user, sep, domain = email.partition('@')
    parts = [user]
    parts.extend(domain.split('.'))
    return len(parts) == 3 and all(part.isalnum() for part in parts)

>>> for email in 'a@b.c', 'ab23@f45.d3', 'a_b@p_q.com', '@bc.d', '123.c@cvb', '', '@', 'a@b@c', '@.', 'abc&**&@test.com':
...     print(validate_email(email)) 
True
True
False
False
False
False
False
False
False
False

The domain part of the email address is restricted to two parts separated by a .. Valid email domains can have at least three parts so, if you want to support that, remove the len(parts) == 3 test.

And here is a regex pattern that works:

import re

def validate_email(email):
    return re.match(r'[a-zA-Z\d]+@[a-zA-Z\d]+\.[a-zA-Z\d]+$', email) != None

>>> for email in 'a@b.c', 'ab23@f45.d3', 'a_b@p_q.com', '@bc.d', '123.c@cvb', '', '@', 'a@b@c', '@.', 'abc&**&@test.com':
...     print(validate_email(email)) 
True
True
False
False
False
False
False
False
False
False

You can't use \w in the pattern because this will match the underscore character and this is not normally considered alphanumeric. The $ is required at the end of the pattern to ensure that the last segment of the email address ends with alphanumeric characters only. With out this extraneous invalid characters appearing at the end of the string after a sequence of valid characters will match.

In this case I'd opt for the first method using just basic string functions because it is (arguably) easier to read and maintain than a regex.

Validating input to follow a specific order

4 Answers4