2

I'm trying to figure out how to write my own regex.

I made a list of viable phone numbers and non-viable ones and trying to make sure the viable ones are included but I can't figure out how to finish it up.

Allowed list

0665363636 //
06 65 36 36 36 //
06-65-36-36-36 //
+33 6 65 36 36 36

Not allowed

06 65 36 36 //
2336653636 //
+3366536361 //
0065363636 

I messed around with it a bit and I currently have this:

[0+][63][6 \-3][56\ ][\d{1}][\d \-]\d{2}[\d{1} \-]\d\d? ?\-?\d?\d? ?\d?\d?$

This blocks out number 2 and 4 of the non allowed but I can't seem to figure out how to block the other ones out.

Should I put a minimum amount of numbers? If so how would I do this.

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
Martijn Kerckhaert
  • 494
  • 1
  • 6
  • 21
  • Lucas provided the correct answer. An other way to do this would be to clean up incoming phone numbers with gsub so there would be no spaces, - or + in the number. `phone_number.gsub(/[^\d\+]/, "")` and then it can be as small as this: `/^(0|\+33)[1-9]\d{8}$/ ` – Martijn Kerckhaert May 09 '15 at 13:41

2 Answers2

2

[Edit: After posting this I see it is very similar to @Lucas' answer. I will let it stand, however, for the alternative presentation.]

I would try constructing a regex for each of the allowed patterns and then take their union to obtain a single regex.

We see that all of the allowable numbers not beginning with + have 10 digits, so I will assume that's a requirement. If different numbers of digits are permitted, that can be dealt with easily.

1. Include 0665363636, exclude 2336653636 and 0065363636

I assume this means the number must begin with the digit 0 and the second digit must not be 0. That's easy:

r1 = /
     ^     # match start of string
     0     # match 0
     [1-9] # match any digit 1-9
     \d{8} # match 8 digits
     $     # match end of string
     /x

Test:

'0665363636' =~ r1 #=> 0
'2336653636' =~ r1 #=> nil
'0065363636' =~ r1 #=> nil 

That seems to work.

2. Include 06 65 36 36 36, exclude 06 65 36 36

Another easy one:

r2 = /
     ^       # match start of string
     0       # match 0
     [1-9]   # match any digit 1-9 # or \d if can be zero
     (?:     # begin a non-capture group
       \s    # match one whitespace
       \d{2} # match two digits
     )       # end capture group
     {4}     # match capture group 4 times
     $       # match end of string
     /x

Test:

'06 65 36 36 36' =~ r2 #=> 0
'06 65 36 36'    =~ r2 #=> nil

Another apparent success!

We see that 06-65-36-36-36 should also be permitted. That's such a small variant of the above we don't have to bother creating another regex to include in the union; instead we just modify r2 ever-so-slightly:

r2 = /^0[1-9](?:
      [\s-] # match one whitespace or a hyphen
      \d{2}){4}$
     /x

Notice that we don't have to escape the hyphen when it's in a character class.

Test:

'06 65 36 36 36' =~ r2 #=> 0
'06-65-36-36-36' =~ r2 #=> 0

Yes!

3. Include +33 6 65 36 36 36, exclude +3366536361

It appears that, when the number begins with a +, + must be followed by two digits, a space, one digit, a space, then four pairs of numbers separated by spaces. We can just write that down:

r3 = /
     ^       # match start of string
     \+      # match +
     \d\d    # match two digits
     \s\d    # match one whitespace followed by a digit
     (?:     # begin a non-capture group
       \s    # match one whitespace
       \d{2} # match two digits
     )       # end capture group
     {4}     # match capture group 4 times
     $       # match end of string
     /x

Test:

'+33 6 65 36 36 36' =~ r3 #=> 0
'+3366536361'       =~ r3 #=> nil

Nailed it!

Unionize!

r = Regexp.union(r1, r2, r3)
 => /(?x-mi:
         ^     # match start of string
         0     # match 0
         [1-9] # match any digit 1-9
         \d{8} # match 8 digits
         $     # match end of string
         )|(?x-mi:^0[1-9](?:
          [\s-] # match one whitespace or a hyphen
          \d{2}){4}$
         )|(?x-mi:
         ^       # match start of string
         \+      # match +
         \d\d    # match two digits
         \s\d    # match one whitespace followed by a digit
         (?:     # begin a non-capture group
           \s    # match one whitespace
           \d{2} # match two digits
         )       # end capture group
         {4}     # match capture group 4 times
         $       # match end of string
         )/ 

Let's try it:

['0665363636', '06 65 36 36 36', '06-65-36-36-36',
 '+33 6 65 36 36 36'].any? { |s| (s =~ r).nil? } #=> false

['06 65 36 36', '2336653636', '+3366536361',
 '0065363636'].all? { |s| (s =~ r).nil? } #=> true

Bingo!

Efficiency

Unionizing individual regexes may not produce the most efficient single regex. You must decide if the benefits of easier initial initial construction and testing, and ongoing maintenance, are worth the efficiency penalty. If efficiency is paramount, you might still construct the r this way, then tune it by hand.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
1

Looks like you want to limit the allowed phone numbers to French mobile phone numbers only.

You made a list of valid and invalid strings, which is a good starting point. But then, I think you just wanted to write the pattern in one shot, which is error-prone.

Let's follow a simple methodology and go through the allowed list and craft a very simple regex for each one:

0665363636         -> ^06\d{8}$
06 65 36 36 36     -> ^06(?: \d\d){4}$
06-65-36-36-36     -> ^06(?:-\d\d){4}$
+33 6 65 36 36 36  -> ^\+33 6(?: \d\d){4}$

So far so good.

Now, just combine everything into one regex, and factor it a bit (the 06 part is common in the first 3 cases):

^06(?:\d{8}|(?: \d\d){4}|(?:-\d\d){4})|\+33 6(?: \d\d){4}$

Et voilà. Demo here.


As a side note, you should rather use:

^0[67](?:\d{8}|(?: \d\d){4}|(?:-\d\d){4})|\+33 [67](?: \d\d){4}$

As French mobile phone numbers can start in 07 too.

Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
  • Thank you, this explains the thought process clearly for me. However when I use the site my university gave me it lists as if it would only accept the 4th one. The site they provided was rubular.com. EDIT; seems like rubular broke on the ^ – Martijn Kerckhaert May 09 '15 at 12:54
  • @Martijn this works for me on rubular, see [this link](http://rubular.com/r/nhXOpF6QGq), you may have to use the `m` option also. – Lucas Trzesniewski May 09 '15 at 12:58