Disclaimer
Since several answers have already addressed the greater efficiency of string builders, etc., I wanted to show you how it could be done with regex and address the benefits of using this approach.
One REGEX Solution
Using this matching regex (similar to Alan Moore's expression):
(.{3})(.{3})(.{4})
allows you to match precisely 10 characters into 3 groups, then use a replace expression that references those groups, with additional characters added:
($1) $2-$3
thus producing the replacement like you requested. Of course, it will also match punctuation and letters as well, which is a reason to use \d
(encoded into a Java string as \\d
) rather than the .
wildcard character.
Why REGEX?
The potential advantage of a regex approach to something like this is the compression of "logic" to the string manipulation. Since all the "logic" can be compressed into a string of characters, rather than pre-compiled code, the regex matching and replacement strings can be stored in a database for easier manipulation, updating, or customization by an experienced user of the system. This makes the situation more complex on several levels, but allows considerably more flexibility for users.
With the other approaches (string manipulation), changing a formatting algorithm so that it will produce (555)123-4567
or 555.123.4567
instead of your specified (555) 123-4567
would essentially not be possible merely through the user interface. with the regex approach, the modification would be as simple as changing ($1) $2-$3
(in the database or similar store) into $1.$2.$3
or ($1)$2-$3
as appropriate.
If you wanted to modify your system to accept "dirtier" input, which might include various attempts at formatting, such as 555-123.4567
and reformat them to something consistent, it would be possible to make a string-manipulation algorithm that would be capable of this and recompile the application to work how you would like. With a regex solution, however, a system overhaul would not be necessary - merely change the parsing and replacement expressions like so (maybe a little complex for beginners to understand right away):
^\D*1?\D*([2-9])\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d).*$
($1$2$3) $4$5$6-$7$8$9$10
This would allow a significant "upgrade" in the program's ability, as shown in the following reformatting:
"Input" "Output"
----------------------------- --------------------------------
"1323-456-7890 540" "(323) 456-7890"
"8648217634" "(864) 821-7634"
"453453453322" "(453) 453-4533"
"@404-327-4532" "(404) 327-4532"
"172830923423456" "(728) 309-2342"
"jh345gjk26k65g3245" "(345) 266-5324"
"jh3g24235h2g3j5h3" "(324) 235-2353"
"12345678925x14" "(234) 567-8925"
"+1 (322)485-9321" "(322) 485-9321"
"804.555.1234" "(804) 555-1234"
"08648217634" <no match or reformatting>
As you can see, it is very "tolerant" of input "formatting" and knows that 1
should be ignored at the beginning of the number and that 0
should cause an error because it is invalid - all stored in a single string.
The question comes down to performance vs. potential to customize. String manipulation is faster than regex, but future enhancement customization requires a recompile rather than a simple alteration of a string. That said, there are things that can't be expressed very well (or even in as readable a fashion as the above change) and some things that are not possible with regex.
TL;DR:
Regex allows storage of parsing algorithms into a relatively short string, which can be easily stored so as to be modifiable without recompiling. Simpler, more focused string manipulation functions are more efficient and can sometimes accomplish more than regex can. The key is to understand both tools and the requirements of the application and use the one most appropriate for the situation.