0

I have a file that contains any of the following number format

12.456.7890
12-456-7890    
123.456.7890
(123)456.7890
(123).456.7890
123-456-7890
(123)-456-7890
(123)456-7890

Is it possible to use regex substitution so that the final output number will always be on a format (123)456-7890 or (12)456-7890

Dren
  • 319
  • 2
  • 14

2 Answers2

2

Yes, it is:

s/\(?(\d\d\d)\)?[-.]?(\d\d\d)[-.]?(\d\d\d\d)/($1)$2-$3/g

I should mention that the above will also parse the following two:

123)456.7890
(123456.7890
José Castro
  • 661
  • 6
  • 14
  • @Dren - Yeah but will the format always work to your solution. –  Aug 02 '16 at 15:33
  • @sln Pardon my test showed a loop hole on the solution. it would not work if there is a missing digit on the 1st part. for example: 23.456.7890 but it works if it is 123.456.789 – Dren Aug 02 '16 at 15:38
  • @ikegami modified the question to include the patterns mentioned – Dren Aug 02 '16 at 15:58
  • @Jose i was able to modify your solution to capture missing digits on the 1st part of the pattern. I made it to look like this s/\(?(\d+)\)?[-.]?(\d\d\d)[-.]?(\d\d\d)/($1)$2-$3/g; – Dren Aug 02 '16 at 16:11
2

You can do this using two substitutions:

perl -lpe 's/\D//g; s/(\d{3})(\d{3})(\d{4})/($1)$2-$3/' file

The first one removes all characters that aren't numeric. The second one inserts the desired characters between each group.

You should take into account that this approach will make a mess of any lines that aren't like the ones in your sample input. One means of protecting yourself could be something like this:

if ((@a = /\d/g ) == 10) { /* perform substitutions */ }

i.e. ensure that the number of matches on the line is 10 before proceeding.

Tom Fenech
  • 72,334
  • 12
  • 107
  • 141