Using a slightly different approach might also be useful to you... I've used a type of approach before in getting telephone number information that involves pulling out the needed information and reformatting it - you may have requirements that don't fit this solution, but I'd like to suggest it anyways.
Using this match expression:
(?i)^\D*1?\D*([2-9])\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)[^x]*?\s*(?:(?:e?(x)(?:\.|t\.?|tension)?)\D*(\d+))?.*$
and this replace expression:
($1$2$3) $4$5$6-$7$8$9$10 $12$13
you should be able to reformat these inputs as indicated:
Input Output
----------------------------- --------------------------------
"1323-456-7890 540" "(323) 456-7890 "
"8648217634" "(864) 821-7634 "
"453453453322" "(453) 453-4533 "
"@404-327-4532" "(404) 327-4532 "
"172830923423456" "(728) 309-2342 "
"17283092342x3456" "(728) 309-2342 x3456"
"jh345gjk26k65g3245" "(345) 266-5324 "
"jh3g24235h2g3j5h3x245324" "(324) 235-2353 x245324"
"12345678925x14" "(234) 567-8925 x14"
"+1 (322)485-9321" "(322) 485-9321 "
"804.555.1234" "(804) 555-1234 "
I'll grant you it's not the most efficient expression, but an inefficient regex is not usually a problem when run on a short amount of text, especially when written with knowledge and a small amount of care
To break down the parsing expression a little bit:
(?i)^\D*1?\D* # mode=ignore case; optional "1" in the beginning
([2-9])\D*(\d)\D*(\d)\D* # three digits* with anything in between
(\d)\D*(\d)\D*(\d)\D* # three more digits with anything in between
(\d)\D*(\d)\D*(\d)\D*(\d)[^x]*? # four more digits with anything in between
\s* # optional whitespace
(?:(?:e?(x)(?:\.|t\.?|tension)?) # extension indicator (see below)
\D*(\d+))? # optional anything before a series of digits
.*$ # and anything else to the end of the string"
The three digits cannot start with 0 or 1. The extension indicator can be x
, ex
, xt
, ext
(all of which can have a period at the end), extension
, or xtension
(which cannot have a period at the end).
As written, the extension (the digits, that is) has to be a contiguous series of numbers (but they usually are, as your given expression assumes)
The idea is to use the regex engine to pull out the first 10 digits (excluding "0" and "1", because domestic U.S. telephone numbers do not start with those (except as a switch, which is not needed or always needed, and is not dependent upon the destination phone, but the phone you're typing it into. It will then try to pull out anything up to an 'x', and capture the 'x', along with the first contiguous string of digits after that.
It allows considerable tolerance in formatting of the input, while at the same time stripping out harmful data or meta-characters, then produces a consistently-formatted telephone number (something that is often appreciated on many levels)