5

I need to do pretty basic phone-number validation and formatting on all US and international phone numbers in Python. Here's what I have so far:

import re 

def validate(number):
    number = re.compile(r'[^0-9]').sub('', number)
    if len(number) == 10:
        # ten-digit number, great
        return number
    elif len(number) == 7:
        # 7-digit number, should include area code
        raise ValidationError("INCLUDE YOUR AREA CODE OR ELSE.")
    else:
        # I have no clue what to do here

def format(number):
    if len(number) == 10:
        # basically return XXX-XXX-XXXX
        return re.compile(r'^(\d{3})(\d{3})(\d{4})$').sub('$1-$2-$3', number)
    else:
        # basically return +XXX-XXX-XXX-XXXX
        return re.compile(r'^(\d+)(\d{3})(\d{3})(\d{4})$').sub('+$1-$2-$3-$4', number)

My main problem is that I have NO idea as to how international phone numbers work. I assume that they're simply 10-digit numbers with a \d+ of the country code in front of them. Is this true?

Naftuli Kay
  • 87,710
  • 93
  • 269
  • 411
  • 1
    possible duplicate of [A comprehensive regex for phone number validation](http://stackoverflow.com/questions/123559/a-comprehensive-regex-for-phone-number-validation) – mac Dec 06 '11 at 21:03
  • @TK Kocheran: Does the other question on SO answers your needs. In that case, you can delete this question. – pyfunc Dec 06 '11 at 21:05
  • I don't think it's going to be nearly this simple. Just taking a look at [this page](http://en.wikipedia.org/wiki/Local_conventions_for_writing_telephone_numbers) shows me that several countries don't fit into the 10 digit format, even though it is very common. You also have to worry about invalid 10 digit numbers. For example, the number 555.XXX.XXXX is invalid in the US. So is 911.XXX.XXXX. These will vary by country. – Kris Harper Dec 06 '11 at 21:06
  • [Here's](https://github.com/daviddrysdale/python-phonenumbers) a Python port of Google's phone library. Maybe you can use that. – jan zegan Dec 06 '11 at 21:11
  • 1
    International Phone Numbers can have arbitrary length. Here in Germany providers do give numbers of different length even to customers in the same city. If my provider transmits to me the dialed number I can even create new numbers by appending more digits to my assigned number and configure my own pbx for them. – johannes Dec 06 '11 at 21:15
  • @Naftuli Did you use the [python port](https://github.com/daviddrysdale/python-phonenumbers) ? How was your experience? Do you recommend it? There is another python package - [phonenumbers](https://pypi.python.org/pypi/phonenumbers), how do you compare there two? – Pankaj Singhal Oct 13 '16 at 14:39

2 Answers2

8

E.164 numbers can be up to fifteen digits, and you should have no expectation that beyond the country code of 1-3 digits that they will fit any particular form. Certainly there are lots of countries where it is not XXX-XXX-XXXX. As I see it you have three options:

  1. Painstakingly create a database of the number formats for every country code. Then check each country individually for updates on a periodic basis. (Edit: it looks like Google already does this, so if you trust them and the Python porter to keep libphonenumber correct and up to date, and don't mind upgrading this library every time there is a change, that might work for you.)
  2. Eliminate all delimiters in the supplied telephone numbers and format them without any spacing: +12128675309
  3. Format the numbers as the user supplies them rather than reformatting them yourself incorrectly.
Michael Hoffman
  • 32,526
  • 7
  • 64
  • 86
  • Do you recommend [python port](https://github.com/daviddrysdale/python-phonenumbers)? There is another python package - [phonenumbers](https://pypi.python.org/pypi/phonenumbers), how do you compare there two? – Pankaj Singhal Oct 13 '16 at 14:40
3

I ignore the format as in where are the spaces and dashes. But here is the regex function I use to validate that numbers:

  • eventually, start with a + and some digits for the country code
  • eventually, contain one set of brackets with digits inside for area code or optional 0
  • finish with a digit
  • contain spaces or dashes in the number itself (not in the country or area codes):
def is_valid_phone(phone):
    return re.match(r'(\+[0-9]+\s*)?(\([0-9]+\))?[\s0-9\-]+[0-9]+', phone)
David Ferenczy Rogožan
  • 23,966
  • 9
  • 79
  • 68
Erwan
  • 3,733
  • 30
  • 25