4

I want to replace the given phone numbers in an html string, such as

<a>click here now! (123) -456-789</a>

I think that the best way to approach it would be to find all the different circumstances where there looks like a phone number, such as:

$pattern = *any 3 numbers* *any characters up to 3 characters long* 
$pattern .= *any 3 numbers* *any characters up to 3 characters long* 
$pattern .= *any numbers up to 4 numbers long*

// $pattern maybe something like [0-9]{3}\.?([0-9]{3})\.?([0-9]{4})

$array = preg_match_all($pattern, $string);

foreach($array)
{
    // replace the string with the the new phone number
}

Basically, how would the regex be?

Jacob Kranz
  • 921
  • 3
  • 11
  • 24
  • How general do you need it? Do you know for sure the phone numbers will be formatted as `(123) 456-7890`, or does it have to handle any sort of spacing (or not), or periods/parens/hyphens/etc? – CmdrMoozy Jun 26 '13 at 22:14
  • No idea how it will go, that's why I'm assuming that there should be up to 3 characters long. I am going to make the assumption that it's not something like (123) - 456 - 7890. – Jacob Kranz Jun 26 '13 at 22:15
  • 1
    The key word here is *regular* expression. The data you're looking to match appears to be inherently irregular, so your matching it going to be spotty *at best*. – Sammitch Jun 26 '13 at 22:15
  • Please use `\d` instead of the ugly `[0-9]` while you can :) – HamZa Jun 26 '13 at 22:16
  • I do know that there will be a certain rule, such as (111) 222-333 or 111-222-333. I make the assumption that it's not going to be spelled out (such as one11-222-333). – Jacob Kranz Jun 26 '13 at 22:16
  • My general question is "what would the regex be"? I'm open to discussing the logic behind it though, but I think that having these two regex's would be the best way to approach this problem. – Jacob Kranz Jun 26 '13 at 22:18
  • how about this answer: http://stackoverflow.com/a/123666/498699 – Schleis Jun 26 '13 at 22:19
  • possible duplicate of [A comprehensive regex for phone number validation](http://stackoverflow.com/questions/123559/a-comprehensive-regex-for-phone-number-validation) –  Jun 26 '13 at 22:20
  • @Dagon I am not trying to validate phone numbers, I'm trying to replace it – Jacob Kranz Jun 26 '13 at 22:42
  • same principle when it comes to using regular expressions –  Jun 26 '13 at 22:46
  • @Dagon it's a similar principle, but validating phone numbers requires much more work than matching phone numbers for stripping. – AbsoluteƵERØ Jun 27 '13 at 15:31
  • Before you can write a regular expression, you have to be able to describe, in English, the rules that you're trying to implement. – Andy Lester Jun 27 '13 at 16:48
  • We're trying to help, and your attitude does you no favors. "Looks like a phone number" is an inadequate description of the problem, and your three lines of rules don't take into account punctuation, and are inaccurate because you don't want "up to 3 characters long", but rather "exactly three characters long". So what variations of "looks like a phone number" do you want to handle? Write out exact rules you want to check for, and then we can help you define a regex to implement that. – Andy Lester Jun 28 '13 at 16:25

3 Answers3

10

Based on the Local conventions for writing telephone numbers entry in Wikipedia, there are a variety of formats globally if you want to strip out ALL phone numbers. In the following examples the place holder 0 represents a number. The following is a sample from the wiki entry (there may be duplicates).

0 (000) 000-0000
0000 0000
00 00 00 00
00 000 000
00000000
00 00 00 00 00
+00 0 00 00 00 00
00000 000000
+00 0000 000000
(00000) 000000
+00 0000 000000
+00 (0000) 000000
00000-000000
00000/000000
000 0000
000-000-000
0 0000 00-00-00
(0 0000) 00-00-00
0 000 000-00-00
0 (000) 000-00-00
000 000 000
000 00 00 00
000 000 000
000 000 00 00
+00 00 000 00 00
0000 000 000
(000) 0000 0000
(00000) 00000
(0000) 000 0000
0000 000 0000
0000-000 0000
0000 000 0000
00000 000000
0000 000000
0000 000 00 00
+00 000 000 00 00
(000) 0000000
+00 00 00000000
000 000 000
+00-00000-00000
(0000) 0000 0000
+00 000 0000 0000
(0000) 0000 0000
+00 (00) 000 0000
+00 (0) 000 0000
+00 (000) 000 0000
(00000) 00-0000
(000) 000-000-0000
(000) [00]0-000-0000
(00000) 0000-0000
+ 000 0000 000000
8.8.8.8
192.168.1.1
0 (000) 000-0000 ext 1
0 (000) 000-0000 x 1001
0 (000) 000-0000 extension 2
0 000 000-0000 code 3

Since while you could try to write some crazy REGEX that would qualify each number based on it's country code, dialing prefix, etc for matching in your purposes this is not needed and would be a waste of time. From a Bayesian approach the longer numbers tend to be 18 characters (Argentina mobile numbers) with possibility of a leading + character followed by numbers [0-9] or \d, parenthesis (), brackets [] and possibly spaces , periods ., or hyphens - and one obscure format with a /.

\b\+?[0-9()\[\]./ -]{7,17}\b

For all of these numbers we'll also append the following extension formats

ext 123456
x 123456
# 123456
EXT 123456
- 123456
code 2
-12
Extension 123456

\b\+?[0-9()\[\]./ -]{7,17}\s+(extension|x|#|-|code|ext)\s+[0-9]{1,6}

So total you would look for phone numbers or phone numbers with extensions:

$pattern = '!(\b\+?[0-9()\[\]./ -]{7,17}\b|\b\+?[0-9()\[\]./ -]{7,17}\s+(extension|x|#|-|code|ext)\s+[0-9]{1,6})!i';

Note: that this will also strip IP addresses. If you want to keep IP addresses you will need to replace the periods in the IP addresses with something that will not match our Phone Number Regex, then switch them back.

So for your code you would use:

$string = preg_replace($pattern,'*Phone*',$string);

Here's a PHP fiddle of the matching test.

AbsoluteƵERØ
  • 7,816
  • 2
  • 24
  • 35
1

I think this will match two sets of three digits and a set of four digits, with "common" phone number punctuation in-between:

\d{3}[().-\s[\]]*\d{3}[().-\s[\]]*\d{4}

This allows for three digits, then any number of punctuation characters or spaces, then three more digits, then more punctuation, then four digits.

However, without a better idea of the formatting of the input, you will never really be sure that you're going to get only phone numbers and not something else, or that you won't skip over any phone numbers.

If you want to replace the number you find with your own number, I might try something like this:

preg_replace('/\d{3}([().-\s[\]]*)\d{3}([().-\s[\]]*)\d{4}/',
    "123$1456$27890", $input);

In the replacement string, $1 and $2 are the two parenthesized blocks of punctuation in-between the numbers. This way you can replace just the numbers you find, and leave the punctuation alone by inserting the same punctuation back into the resulting string.

CmdrMoozy
  • 3,870
  • 3
  • 19
  • 31
  • So, now that we have this match, how would we go about replacing the numbers themselves with my own phone number? And I absolutely appreciate your comment about irregular expression. I understand that phone numbers are a pain to do, but I think this would be the best route to take (maybe you disagree?) – Jacob Kranz Jun 26 '13 at 22:23
  • If you have no control over the output you're parsing, then this is probably about as good as you can do (you could make the regex more complicated to make sure it's a valid phone number, for some improvement). Do you need your phone number to be in the same format? If not, how about something like: `preg_replace($pattern, "(123) 456-7890", $input);`? If the formatting is important, then you'll probably want to look into using capture groups. – CmdrMoozy Jun 26 '13 at 22:24
0

Here is the function I use that I downloaded from somewhere (don't remember where I got this from).

/*
// PHP function to validate US phone number:
// (c) 2003
// No restrictions have been placed on the use of this code
//
// Updated Friday Jan 9 2004 to optionally ignore the area code:
//
// Input: a single string parameter and an optional boolean variable (default=true)
// Output: 10 digit telephone number or boolean false(0)
//
// The function will return the numerical part of the alphanumeric string
// parameter with the following sequence of characters:
// any number of spaces [optional],
// a single open parentheses [optional],
// any number of spaces [optional],
// 3 digits (area code),
// any number of spaces [optional],
// a single close parentheses [optional],
// a single dash [optional],
// any number of spaces [optional],
// 3 digits, any number of spaces [optional],
// a single dash [optional],
// any number of spaces [optional],
// 4 digits, any number of spaces [optional]:
*/
function validate_USphone($phonenumber, $useareacode=true)
{
   if ( preg_match("/^[ ]*[(]{0,1}[ ]*[0-9]{3,3}[ ]*[)]{0,1}[-]{0,1}[ ]*[0-9]{3,3}[ ]*[-]{0,1}[ ]*[0-9]{4,4}[ ]*$/",$phonenumber) || (preg_match("/^[ ]*[0-9]{3,3}[ ]*[-]{0,1}[ ]*[0-9]{4,4}[ ]*$/",$phonenumber) && !$useareacode)) return preg_replace("/[^0-9]/i", "", $phonenumber);
   return false;
}
Revent
  • 2,091
  • 2
  • 18
  • 33