1

I have a regex for universal phone numbers:

**/^(\+\d)*\s*(\(\d{3}\)\s*)*\d{3}(-{0,1}|\s{0,1})\d{2}(-{0,1}|\s{0,1})\d{2}$/**

It is accepting the following strings:

339-4248 
(095) 2569835 
+7 (095) 1452389
+1(963)9632587
+12365874
2365789

But it's not accepting

+12589637412
+1 963 9632587
+1701234567

What's the matter with this? Please help me figure out where I am wrong.

Kara
  • 6,115
  • 16
  • 50
  • 57
user958414
  • 385
  • 1
  • 5
  • 10
  • What language are you using to run this regex? PHP? C#? Different engines have different quirks to them. – Polynomial Oct 06 '11 at 08:36
  • possible dupe of http://stackoverflow.com/questions/123559/a-comprehensive-regex-for-phone-number-validation – Savino Sguera Oct 06 '11 at 08:50
  • 2
    What about `^.*$`? Simple and will catch any and all phone number you can think of. Thanks for at least accepting numbers with a `+` at the start – something companies as big as Google and Amazon get wrong. Still, it doesn't even match my own phone number in its usual form. – Joey Oct 06 '11 at 08:56
  • don't forget that dots are also in common use in phone number formats in some countries, and also in some international formats. – Spudley Oct 06 '11 at 09:12

3 Answers3

1

Why do you care where users care to break up the groups of digits or what characters they use to do so? Around here (Sweden), it's common to see one person write a given phone number as 046 123 456 789 and someone else write it 046 123 45 67 89, but both are dialed identically and are equally valid. (As, for that matter, would be 04 61 2345 6 78 9 - not a format I've ever seen used, but it still dials identically.)

Just strip out non-numeric characters (other than a leading +, since that's meaningful), check that it's a reasonable number of digits, store that, and render it into your preferred format when displaying the number. Or keep the format as entered by the user, although then you need to take the normal precautions to prevent SQL injection, CSS, XSRF, etc. attacks.

Dave Sherohman
  • 45,363
  • 14
  • 64
  • 102
0

One thing you can do is to research all the formats. You have found a few good ones. There are more here: http://en.wikipedia.org/wiki/Local_conventions_for_writing_telephone_numbers

Next you want to find documents in your corpus that have phone numbers in them, and others that have numbers that aren't phone numbers. This isn't needed if you are dealing with structured data as much. The idea is you want a control group to show you aren't overreaching.

Then you want to get something like visual-regexp (a common OS independent software package) and put your text into it and start creating regex's until you cover all of your cases.

Doing that with just your examples I came up with this: regexp -nocase -all -line -- {+?(?[0-9])?\ ?[0-9-]} string match

--Pete

Pete Mancini
  • 575
  • 4
  • 15
  • I should point out that that regex won't work on free text at all. It picks up a single space as a legitimate hit. It was just an example. It takes trial after trial to get it right. – Pete Mancini May 09 '13 at 16:19
  • This one is a bit better, but it doesn't get every case: regexp -nocase -all -line -- {(\+|\()?([0-9]{1,4}[\-\.\ \(\)]+){1,3}[0-9]{4,7}} string match v1 v2 – Pete Mancini May 09 '13 at 21:47
0

It only accepts certain multiples of digits, and it only accepts spaces in some places within a number. My recommendation would be to ditch it and revert to a really simple, relaxed check, or else a documented, supported, internationally tested solution (libphone or some such).

tripleee
  • 175,061
  • 34
  • 275
  • 318