Python: Regex Function Parsing Through Email String and Returning Tuple Or Returning ValueError If Input Invalid

Question

I want to write a function that parses through an email input and returns a tuple with (id, domain). id is the user name while domain is the domain name. Email separated by @ character

For example: kyle@asu.edu would parse to ('kyle', 'asu.edu'). But below are some additional constraints on the function:

username begins with alphabetic character
domain name ends with alphabetic character
special characters such as ., -, _, or + are allowed
no whitespace characters permitted including no leading or trailing whitespaces

So if any of the above rules are violated, then the email input is not considered a valid email address and should raise a ValueError.

Below is my attempted code that doesn't quite work:

def email_func(string_input):
    """Parses a string as an email address, returning an (id, domain) pair."""
    ###
    ### YOUR CODE HERE
    regex_parse = re.search(r'([a-zA-Z_+-.]+)@([a-zA-Z.-]+)', string_input) 
    # print (regex_parse)
    
    try:
        return regex_parse.groups()
    
    except ValueError:
        raise ValueError ('not a valid email address')
    ###

For a simple example it works.

email_func('kyle@asu.edu') returns `('kyle', 'asu.edu')` which is correct.

Instances where my code doesn't work:

For invalid input strings with white spaces I'm not raising a ValueError. For example: email_func('kyle @asu.edu') outputs an error:

---> 11 return regex_parse.groups()
AttributeError: 'NoneType' object has no attribute 'groups'
I'm not getting a ValueError for leading white spaces: For example: email_func(' kyle@asu.edu') outputs ('kyle', 'asu.edu') Same issue with trailing white spaces.
How do I specify in my regex that the email can't start or end with a number / has to be alphabetic character?

Why are you expecting `ValueError`? When `re.search()` can't find a match it returns `None`. `None.groups()` raises `AttributeError`, not `ValueError`. — Barmar, Feb 19 '21 at 16:42
hm, I didn't realize that. I will look into the difference between the two errors. — PineNuts0, Feb 19 '21 at 16:44

score 1 · Answer 1 · edited Feb 19 '21 at 17:02

1

As you can clearly see, calling regex_parse.groups() raises AttributeError, not ValueError, when regex_parse is None, which is what is returned when re.search() can't find a match. So change except ValueError: to except AttributeError:. Or you could simply use

if regex_parse is None: 
    raise ValueError("Not a valid email address")

You should anchor your regexp so it has to match the entire string, not search for a match anywhere in the string. r'^([a-zA-Z_+-.]+)@([a-zA-Z.-]+)$'. ^ matches the beginning, $ matches the end.
Start and end the regexp with [a-zA-Z].
r'([a-zA-Z][a-zA-Z_+-.]*)@([a-zA-Z.-]*[a-zA-Z])'

edited Feb 19 '21 at 17:02

mypetlion

2,415
5
18
22

answered Feb 19 '21 at 16:48

Barmar

741,623
53
500
612

It worked! thank you :) Do you have any recommended sites to practice regex problems? – PineNuts0 Feb 19 '21 at 16:58
regex101.com is good. It displays the breakdown of what each part of the regexp means. – Barmar Feb 19 '21 at 17:04

score 1 · Answer 2 · answered Feb 19 '21 at 17:00

1

I assume that your input is only one email address and you need to validate it. So there is no need to use search. What you are really looking for is the match function.

With small changes to your code, it looks like this:

def email_func(string_input):
    """Parses a string as an email address, returning an (id, domain) pair."""
    ###
    ### YOUR CODE HERE
    regex_parse = re.match(r'([a-zA-Z_+-.]+)@([a-zA-Z.-]+)', string_input) 
    # print (regex_parse)
    
    if regex_parse:
        return regex_parse.groups()
    
    else:
        raise ValueError ('not a valid email address')
    ###

answered Feb 19 '21 at 17:00

Yulian

365
4
12

ah, yes ... can you explain a bit when would be appropriate to use search vs. match conceptually please? – PineNuts0 Feb 19 '21 at 17:05
It is better explained in [this answer](https://stackoverflow.com/a/180993/7769691). – Yulian Feb 19 '21 at 17:08
1

In general, the "search" function searches the entire string for substrings that match the pattern, it's like Ctrl+F, while the "match" function checks the whole string at once to see if it matches the given pattern. – Yulian Feb 19 '21 at 17:11

score 0 · Answer 3 · answered Mar 09 '21 at 23:39

Below should return a tuple (username, domain). It will raise the except block if any of your constraints are violated. Modify as needed.

def email_func(string_input):

    username_template = r'^[a-z][a-z\d_\.+=-]{0,30}@'
    domain_template = r'@\w*\.[a-z]+'

    try:
        username = re.search(username_template, string_input)
        domain = re.search(domain_template, string_input)

        email_tuple = (username.group(0).strip('@'), domain.group(0).strip('@'))

        return email_tuple

    except AttributeError:
        print('please enter a valid email.')

To catch the username, the regex below checks to make sure it starts with a latin letter ^[a-z], then for the rest of the username. If ^[a-z] is violated, except block will trigger.

username_template = r'^[a-z][a-z\d_\.=-]{0,30}@'

To catch domain, the regex below checks to make sure everything proceeding the . is a latin letter and not a digit: .[a-z]+

domain_template = r'@\w*\.[a-z]+'

Python: Regex Function Parsing Through Email String and Returning Tuple Or Returning ValueError If Input Invalid

3 Answers3