grep with regex for phone number

Question

I would like to get the phone numbers from a file. I know the numbers have different forms, I can handle for a single one, but don't know how to get a uniform regex. For example

xxx-xxx-xxxx
(xxx)xxx-xxxx
xxx xxx xxxx
xxxxxxxxxx

I can only handle 1, 2, and 4 together

grep '[0-9]\{3\}[ -]\?[0-9]\{3\}[ -]\?[0-9]\{4\}' file

Is there any one single regex can handle all of these four forms?

You would have to handle 2 separately via alternation (|). The issue is that using basic regex there's no way to tell whether the parens are balanced otherwise. — Joel, Feb 15 '10 at 23:32
Check out Regexr for regex help... http://www.gskinner.com/RegExr/ — Moshe, Feb 16 '10 at 01:10

score 19 · Accepted Answer · answered Feb 16 '10 at 01:09

19

grep '\(([0-9]\{3\})\|[0-9]\{3\}\)[ -]\?[0-9]\{3\}[ -]\?[0-9]\{4\}' file

Explanation:

([0-9]\{3\}) three digits inside parentheses

\| or

[0-9]\{3\} three digits not inside parens

...with grouping parentheses - $...$ - around the alternation so the rest of the regex behaves the same no matter which alternative matches.

answered Feb 16 '10 at 01:09

Alan Moore

73,866
12
100
156

1

The [slight] problem this RegEx is, it also matches a number with more than 4 digits in the last part, e.g. 123-123-12345 or a number with more than 10 digits in it. This: `grep '$\(([0-9]\{3\})\|[0-9]\{3\}$[ -]\?\)\{2\}[0-9]\{4\} '` should handle that nicely. See my reply below for explanation. Cheers!! – MacUsers Apr 07 '13 at 11:23
@MacUsers: Good point. The OP only asked how to get the regex to match everything it should match, and I answered that. Getting it to *not* match the things it shouldn't is much more interesting. – Alan Moore Apr 07 '13 at 21:30
Note that another trick used here is the sequence "[ -]\?" This allows for matching a space, a hyphen, or any other character used to separate the groups of digits in the phone number. – Max West Apr 25 '15 at 17:15

score 9 · Answer 2 · answered Apr 04 '13 at 17:21

There are usually four patterns of phone numbers

1. xxx-xxx-xxxx         grep -o '[0-9]\{3\}\-[0-9]\{3\}\-[0-9]\{4\}'  file.txt
2. (xxx)xxx-xxxx        grep -o '([0-9]\{3\})[0-9]\{3\}\-[0-9]\{4\}'  file.txt
3. xxx xxx xxxx         grep -o '[0-9]\{3\}\s[0-9]\{3\}\s[0-9]\{4\}'  file.txt
4. xxxxxxxxxx           grep -o '[0-9]\{10\}' file.txt

In all

grep -o '\([0-9]\{3\}\-[0-9]\{3\}\-[0-9]\{4\}\)\|\(([0-9]\{3\})[0-9]\{3\}\-[0-9]\{4\}\)\|\([0-9]\{10\}\)\|\([0-9]\{3\}\s[0-9]\{3\}\s[0-9]\{4\}\)' file.txt

Of course, one could simplify the regex above but we can also leave this simplification to grep itself ~

MacUsers · Answer 3 · 2014-06-19T17:05:13.140

This is just a modified version of Alan Moore's solution. This is protected against some race condition where the last part of the number has more than four digits in it or the if the total number of digits are more than 10:

grep '\(\(([0-9]\{3\})\|[0-9]\{3\}\)[ -]\?\)\{2\}[0-9]\{4\} '

Explanation:

$([0-9]\{3\})\|[0-9]\{3\}$ matches exactly three digits (e.g. 234) with or without surrounded by parentheses. \| performs the 'OR' operation.
The first $ ... $ groups together the above format followed by a space or - or no space at all - ([ -]\?) does that.
The \{2\} matches exactly two occurrences of the above
The [0-9]\{4\} ' matches exactly one occurrence for a 4 digit number followed by a space

And it's a bit shorter as well. Tested on RHEL and Ubuntu. Cheers!!

awesome explanation! Saves me time to go look for a tutorial! — Fisher Coder, Jul 02 '16 at 16:56

score 2 · Answer 4 · answered Feb 15 '10 at 23:09

2

You can just OR (|) your regexes together -- will be more readable that way too!

answered Feb 15 '10 at 23:09

Arkady

14,305
8
42
46

can you show me an example? I know OR(|) might work, but I didn't figure out how. – skydoor Feb 16 '10 at 00:34

score 1 · Answer 5 · answered Feb 15 '10 at 23:09

1

My first thought is that you may find it easier to see if your candidate number matches against one of four regular expressions. That will be easier to develop/debug, especially as/when you have to handle additional formats in the future.

answered Feb 15 '10 at 23:09

Brian Agnew

268,207
37
334
440

score 1 · Answer 6 · answered Feb 16 '10 at 00:58

1

grep -P '[0-9]{3}-[0-9]{3}-[0-9]{3}|[0-9]{3}\ [0-9]{3}\ [0-9]{3}|[0-9]{9}|\([0-9]{3}\)[0-9]{3}-[0-9]{3}'

answered Feb 16 '10 at 00:58

D W

2,979
4
34
45

score 1 · Answer 7 · edited Dec 02 '15 at 11:30

We can put all the required phone number validations one by one using an or condition which is more likely to work well (but tiresome coding).

grep '^[0-9]\{10\}$\|^[0-9]\{3\}[-][0-9]\{3\}[-][0-9]\{4\}$\|^[0-9]\{3\}[ ][0-9]\{3\}[ ][0-9]\{4\}$\|^[(][0-9]\{3\}[)][0-9]\{3\}[-][0-9]\{4\}$' phone_number.txt

returns all the specific formats :

920-702-9999
(920)702-9999
920 702 9999
9207029999

score 0 · Answer 8 · edited Jan 04 '13 at 13:37

0

Try this one:

^(\d{10}|((([0-9]{3})\s){2})[0-9]{4}|((([0-9]{3})\-){2})[0-9]{4}|([(][0-9]{3}[)])[0-9]{3}[-][0-9]{4})$

This is only applicable for the formate you mention above like:

xxxxxxxxxx
xxx xxx xxxx
xxx-xxx-xxxx
(xxx)xxx-xxxx

edited Jan 04 '13 at 13:37

Nikola

14,888
21
101
165

answered Jan 04 '13 at 13:13

Tahir khan

13
1
4

score 0 · Answer 9 · answered Aug 08 '16 at 18:12

0

+?(1[ -])?((\d{3})[ -]|(\d{3}[ -]?)){2}\d{4}

works for:

123-678-1234

123 678 1234

(123)-678-1234

+1-(123)-678-1234

1-(123)-678-1234

1 123 678 1234

1 (123) 678 1234

answered Aug 08 '16 at 18:12

gein

1

score 0 · Answer 10 · answered Sep 28 '16 at 13:08

0

grep -oE '\(?\<[0-9]{3}[-) ]?[0-9]{3}[ -]?[0-9]{4}\>'

Matches all your formats.

The \< and \> word boundaries prevent matching numbers that are too long, such as 123-123-12345 or 1234-123-1234

answered Sep 28 '16 at 13:08

glenn jackman

238,783
38
220
352

score -2 · Answer 11 · answered Feb 15 '10 at 23:16

I got this:

debian:tmp$ cat p.txt
333-444-5555
(333)333-6666
123 456 7890
1234567890
debian:tmp$ egrep '\(?[0-9]{3}[ )-]?[0-9]{3}[ -]?[0-9]{4}' p.txt
333-444-5555
(333)333-6666
123 456 7890
1234567890
debian:tmp$ egrep --version
GNU grep 2.5.3

Copyright (C) 1988, 1992-2002, 2004, 2005  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

debian:tmp$

This will match (333-444-555 also. – Joel Feb 15 '10 at 23:30 — Joel, Feb 15 '10 at 23:30

grep with regex for phone number

11 Answers11

Linked

Related