1

I need a Python regex which matches to mobile phone numbers from Germany and Austria.

In order to do so, we first have to understand the structure of a phone number: enter image description here

  • a mobile number can be written with a country calling code in the beginning. However, this code is optional!
  • if we use the country calling code the trunk prefix is redundant!
  • The prefix is composed out of the trunk prefix and the company code
  • The prefix is followed by an individual and unique number with 7 or 8 digits, respectivley.

List of German prefixes:

  • 0151, 0160, 0170, 0171, 0175, 0152, 0162, 0172, 0173, 0174, 0155, 0157, 0159, 0163, 0176, 0177, 0178, 0179, 0164, 0168, 0169

List of Austrian prefixes:

  • 0664, 0680, 0688, 0681, 0699, 0664, 0667, 0650, 0678, 0650, 0677, 0676, 0660, 0699, 0690, 0665, 0686, 0670

Now that we know all rules to build a regex, we have to consider, that humans sometimes write numbers in a very strange ways with multiple whitespaces, / or (). For example:

  • 0176 98 600 18 9
  • +49 17698600189
  • +(49) 17698600189
  • 0176/98600189
  • 0176 / 98600189
  • many more ways to write the same number

I am looking for a Python regex which can match all Austian and German mobile numbers.

What I have so far is this:

^(?:\+4[39]|004[39]|0|\+\(49\)|\(\+49\))\s?(?=(?:[^\d\n]*\d){10,11}(?!\d))(\()?[19][1567]\d{1,2}(?(1)\))\s?\d(?:[ /-]?\d)+
PParker
  • 1,419
  • 2
  • 10
  • 25
  • 4
    *"many more ways to write the same number"* ...that part is problematic I'm afraid. – JvdV Jan 04 '22 at 12:05
  • 2
    I'd start with removing everything that is not a `+` at the beginning or a digit. – Klaus D. Jan 04 '22 at 12:06
  • Well, I think there are many if you are creative. But the regex doesn't have to match all of it. Maybe the most used ones. There is no perfect regex. – PParker Jan 04 '22 at 12:08
  • You don't want to match all the creative ways of writing a telephone number. You only want to match the digits. As Klaus said, remove everything but the digits before starting to match. – Tomalak Jan 04 '22 at 12:12
  • The use case is a chatbot. The purpose of the regex is to validate the users input when the bot asks the user to write his phone number. The first steps is always the validation with the corresponding regex. I can not remove any parts on the input before the validation is completed. Therefore I need to find a regex which can somehow validate if the users input is nonesense or a real phone number. – PParker Jan 04 '22 at 12:20
  • No expert on the matter, but can't a chatbot not say "Sorry, I don't recognize the input, please use pattern x or y", or even better, can't you control the input field? What I'm trying to say is; isn't there any way you can make live a little bit easier on yourself here? – JvdV Jan 04 '22 at 12:27
  • 1
    You can normalize a string before applying a regex. Except of cause you want to store the "creative" version and can't spare a few bytes of RAM to keep both in memory. – Klaus D. Jan 04 '22 at 12:39

1 Answers1

2

You can use

(?x)^          # Free spacing mode on and start of string
 (?:           # A container group:
   (\+49|0049|\+\(49\)|\(\+49\))? [ ()\/-]*  # German: country code
   (?(1)|0)1(?:5[12579]|6[023489]|7[0-9])    #         trunk prefix and company code
 |                                           # or
   (\+43|0043|\+\(43\)|\(\+43\))? [ ()\/-]*  # Austrian:  country code
   (?(2)|0)6(?:64|(?:50|6[0457]|7[0678]|8[0168]|9[09])) # trunk prefix and company code
 )
 [ ()\/-]*   # zero or more spaces, parens, / and -
 \d(?:[ \/-]*\d){6,7} # a digit and then six or seven occurrences of space, / or - and a digit
 \s* # zero or more whites
$ # end of string

See the regex demo.

A one-line version of the pattern is

^(?:(\+49|0049|\+\(49\)|\(\+49\))?[ ()\/-]*(?(1)|0)1(?:5[12579]|6[023489]|7[0-9])|(\+43|0043|\+\(43\)|\(\+43\))?[ ()\/-]*(?(2)|0)6(?:64|(?:50|6[0457]|7[0678]|8[0168]|9[09])))[ ()\/-]*\d(?:[ \/-]*\d){6,7}\s*$

See this demo.

How to create company code regex

  1. Go to the Optimize long lists of fixed string alternatives in regex
  2. Click the Run code snippet button at the bottom of the answer to run the last code snippet
  3. Re-size the input box if you wish
  4. Get the list of your supported numbers, either comma or linebreak separated and paste it into the field
  5. Click Generate button, and grab the pattern that will appear below.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Does he realy? Austrain country code with German prefix is not matched: ``+4317612345678``. German country code with German prefix is matched: ``+4917612345678``. Seems right to me. – PParker Jan 04 '22 at 12:48
  • I need some time to fully undestand and check the solution – PParker Jan 04 '22 at 12:48
  • It is important to me that I understand how I can add more prefixes. The thefixes (or more precisely company code) can change with time. Companies will add more numbers to the list. Therefore I have to know, how to easily add them – PParker Jan 04 '22 at 12:51
  • 1
    @PParker I added the instructions. – Wiktor Stribiżew Jan 04 '22 at 13:36
  • Thank you very much for this impressiv regex! – PParker Jan 04 '22 at 13:51