1

I have a bunch of numbers which I want to parse.

+79261234567
89261234567
79261234567
9261234567
+7 926 123 45 67
8(926)123-45-67
123-45-67
79261234567
(495)1234567
(495) 123 45 67
89261234567
8-926-123-45-67
8 927 1234 234
8 927 12 12 888
8 927 12 555 12
8 927 123 8 123

What I came with at first is cycle through all the variants like this

(\+[\d]{11}|[\d]{10,11}|\+\d\ [\d]{3}\ [\d]{3}\ [\d]{2}\ [\d]{2}|\d\([\d]{3}\)[\d\-]{9}|[\d\ ]{14,15}|[\d\-]{14,15}|[\d\-]{9}|\(\d\d\d\)[\d\-]{9,10}|\(\d\d\d\)[\d\ ]{9,10}|\(\d\d\d\)[\d\-]{7})

Is there more elegant way to match these numbers?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 2
    It's not very elegant trying to capture _all_ possible ways someone could write down a telephone number, so I guess there is no elegant regex either. – Good Night Nerd Pride May 31 '16 at 13:24
  • I'd just check if the string contains a minimum number of digits, and let the user type his phone number ;) – Washington Guedes May 31 '16 at 13:26
  • I agree. Why not remove anything but digits and plus signs (`[^+0-9]+`") from your input and then parse the rest? – Tim Pietzcker May 31 '16 at 13:27
  • Guys, this is not about programming, this is about parsing :) – Михаил Павлов May 31 '16 at 13:33
  • Why don't you strip all non-numbers and act on digit count? Why parse at all? https://github.com/googlei18n/libphonenumber/blob/master/FALSEHOODS.md pops up to my mind. – rr- May 31 '16 at 13:36
  • @TimPietzcker Yes, and also series of spaces. But this is what I have at the moment. – Михаил Павлов May 31 '16 at 13:36
  • 1
    Possible duplicate of [A comprehensive regex for phone number validation](http://stackoverflow.com/questions/123559/a-comprehensive-regex-for-phone-number-validation) – Ken White May 31 '16 at 13:48
  • There's an answer to a similar question (http://stackoverflow.com/a/123666/3652920 which is from the possible duplicate noted above) which seems like it might be applicable here. I think @Abbondanza is probably correct though, there's really no good way to do this generically, unless you can narrow down the set to specific countries/regions. Then you can at least rule out all the other possibilities. – CDahn May 31 '16 at 13:49
  • @МихаилПавлов your regex isn't actually parsing anything, it's just validating the format. – CDahn May 31 '16 at 14:01
  • @CDahn, any constructive point of your remark? – Михаил Павлов May 31 '16 at 14:03
  • @МихаилПавлов I'm just suggesting that maybe you're asking the wrong question, before this post gets closed as a duplicate. See the comment conversation with 4castle below. If you're actually trying to parse the phone numbers with a regex, then the regex needs to have capture groups which match different portions of the phone numbers, which will vary by country, region, national and international, dialing, etc. What you've actually provided in your example is a phone number "validator", which is entirely different than a "parser." – CDahn May 31 '16 at 14:07
  • @CDahn `If you're actually trying to parse the phone numbers with a regex...` you got me right – Михаил Павлов May 31 '16 at 14:13

2 Answers2

0

To have a more elegant solution, you will have to make the pattern more relaxed. One option is to capture 7, 10, or 11 numbers separated by 0 or more delimiters:

\+?(?:[ ()-]*\d){10,11}|(?:[ ()-]*\d){7}

Regex101 Tested

4castle
  • 32,613
  • 11
  • 69
  • 106
  • This is kind of kicking the can down the road though.. What happens when it's an 8 digit number, or 14? You'll end up in the same situation the OP is in now, just with OR'ing together relaxed length regexes. – CDahn May 31 '16 at 13:45
  • @CDahn They could just say 7 or more numbers then: `\+?(?:[ ()-]*\d){7,}` – 4castle May 31 '16 at 13:50
  • Yeah, but I think the spirit of the original post is to validate certain formats. While the OP didn't say it, I think what he actually wants is to validate country codes, etc. If you're just trying to identify that you have 10 numbers, then I would definitely agree that your regex is the correct way to go about it, but surely he can't mean that.... Right? :-) – CDahn May 31 '16 at 13:59
  • @CDahn Could be, but if this is just about parsing numbers as he says, then validating for a real phone number would probably be done by the code that receives the match. – 4castle May 31 '16 at 14:01
  • Agreed. If he's got other code that actually parses up the number, as his post is asking, then I'd encourage him to just use that and scrap the regex altogether. The parser can tell him whether it's a valid phone number or not. – CDahn May 31 '16 at 14:04
  • @CDahn I doubt there is a sane person who will do validation to all these kinds of format in one project. Validation is when you get data and check if it is good, usually there is one valid format during validation. – Михаил Павлов May 31 '16 at 14:21
  • @МихаилПавлов Sorry, I guess I misunderstood your post then. When you made a regex with one long string of alternations, I thought that you were trying to do all these kinds of formats in one project. – CDahn May 31 '16 at 16:56
0

This regex will match all of the examples and not much extra:

[+]?(\b\d{1,2}[ -]?)?([(]?\d{3}[)]?)((?:[ -]?\d){4,7})(?![ -]?\d)

It can contain between 7 to 12 digits.

Although it would still match with something like this :

+12 (345) 6-7-8 9-0-1

But that should be within acceptable limits.

However, that one could still match part of a longer number.
And to avoid that it would need some negative look-behinds.
(note that there are no look-behinds in javascript regex)

[+]?(?<!\d)(?<!\d[ -])(?:((\d{1,2}[ -]?)?[(]?\d{3}[)]?[ -]?)(\d(?:[ -]?\d){3,6}))(?![ -]?\d)

Here's a regex101 test for that last one.

LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • 1
    Putting the `\d` inside the character class opens up the ability for the number to be entirely non-digits and ending in a digit. Such as `------2` – 4castle May 31 '16 at 13:55
  • This one will fail for most international numbers with code that has more than 2 numbers. E.g. +493025554433, +330140205317, etc – SublimeYe Feb 12 '18 at 19:02
  • @SublimeYe Should be ok for those also now. Those were not in the samples. – LukStorms Feb 12 '18 at 23:22