How to write a regex script that accounts for a US telephone number in addition to edge cases and common typos

Question

I'm working in Python 3 and trying to figure out how to match a US telephone number as well as edge cases or common typos that might appear. I need to be able to handle a variety of different inputs and can't simply exclude an invalid number as long as it's 9 digits long. So far I've been writing each different scenario, but was wondering if there is a more simple or straight forward way of doing this. I'm also not sure whether there is a good way (or at least a standard way) of accounting for the possibility of white space. Here's what I have so far:

#Using regex to capture different phone number formats:
^[2-9]\d{2}-\d{3}-\d{4}$         #matches a phone number in the format ANN-NNN-NNNN, where A must be between 2 and 9 and N must be between 0 and 9.
^\([2-9]\d{2}\)-\d{3}-\d{4}$     #for (ANN)-NNN-NNNN
#Edge cases:
^\([2-9]\d{2}-\d{3}-\d{4}$       #for (ANN-NNN-NNNN 
^[2-9]\d{2}\)-\d{3}-\d{4}$       #for ANN)-NNN-NNNN
^[2-9]\d{2}-\d{3}\d{4}$          #for ANN-NNNNNNN 
^\([2-9]\d{2}\)-\d{3}\d{4}$      #for (ANN)-NNNNNNN
^[2-9]\d{2}\d{3}-\d{4}$          #for ANNNNN-NNNN
^\([2-9]\d{2}\)\d{3}-\d{4}$      #for (ANN)NNN-NNNN
^[2-9]\d{2}\d{3}\d{4}$           #for ANNNNNNNNN 
^\([2-9]\d{2}\)\d{3}\d{4}$       #for (ANN)NNNNNNN

Have you considered using pyparsing? Might be a lot easier – Alex Zisman Dec 15 '17 at 22:32 — Alex Zisman, Dec 15 '17 at 22:32

score 0 · Accepted Answer · answered Dec 15 '17 at 22:33

0

The fix to include all edge cases is simple, just make ()- optional by adding ? after them:

test
# ['333-333-3333', '(333)-333-3333', '(333-333-3333', '333)-333-3333', '333-3333333', '(333)-3333333', '333333-3333', '(333)333-3333', '3333333333', '(333)3333333']

pattern = "^\(?[2-9]\d{2}\)?-?\d{3}-?\d{4}$"

import re
[True if re.match(pattern, x) else False for x in test]
# [True, True, True, True, True, True, True, True, True, True]

answered Dec 15 '17 at 22:33

Psidom

209,562
33
339
356

wow this is really good, thank you so much. The only other issue is accounting for white space. So something like (333) 333 3333 or (333)-333 3333 – David Dec 15 '17 at 23:27

How to write a regex script that accounts for a US telephone number in addition to edge cases and common typos

1 Answers1