0

Trying to work out how to parse out phone numbers that are left in a string.

e.g.

 "Hi Han, this is Chewie, Could you give me a call on 02031234567"
 "Hi Han, this is Chewie, Could you give me a call on +442031234567"
 "Hi Han, this is Chewie, Could you give me a call on +44 (0) 203 123 4567"
 "Hi Han, this is Chewie, Could you give me a call on 0207-123-4567"
 "Hi Han, this is Chewie, Could you give me a call on 02031234567 OR +44207-1234567"

And be able to consistently replace any one of them with some other item (e.g. some text, or a link).

Am assuming it's a regex type approach (I'm already doing something similar with email which works well).

I've got to

 text.scan(/([^A-Z|^"]{6,})/i)

Which leaves me a leading space I can't work out how to drop (would appreciate the help there). Is there a standard way of doing this that people use?

It also drops things into arrays, which isn't particularly helpful

i.e. if there were multiple numbers.

[["02031234567"]["+44207-1234567"]]

as opposed to

["02031234567","+44207-1234567"]
Carpela
  • 2,155
  • 1
  • 24
  • 55
  • 1
    What is a "phone number"? If you can define that with a singular regular expression I think you'll win a Nobel Prize. – tadman Jun 24 '15 at 16:29
  • Not worrying about getting it absolutely right. But there's got to be a decent way of getting close... – Carpela Jun 24 '15 at 17:03
  • I'm very tempted to mark this as a duplicate, but will leave that to others. You need to read "[A comprehensive regex for phone number validation](http://stackoverflow.com/q/123559/128421)". What you're trying to do is difficult as there are a wide range of possible formats, and people can arbitrarily/accidentally alter their number format so strip everything but the values that can only exist in a phone number and go from there. – the Tin Man Jun 24 '15 at 17:21
  • I should probably clarify, I don't need this to turn into proper phone numbers. Using the phony gem for that which will do it far better than any regex. It's more that I'm trying to sanitize the string and remove the numbers from it, knowing what I'm removing so I can replace it with something sensible – Carpela Jun 24 '15 at 21:01

4 Answers4

5

Adding in the third use-case with spaces is difficult. I think the only way to successfully meet that acceptance criteria would be to chain a #gsub call on to your #scan.

Thus:

text.gsub(/\s+/, "").scan(/([^A-Z|^"|^\s]{6,})/i)
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Collin Graves
  • 2,207
  • 15
  • 11
  • It's almost there, but but deletes all the spaces in the numbers before the scan (I want to be able to find the string later and replace it with something so that won't help). i.e. I just want to drop the leading space and keep the remaining string exactly as it started. – Carpela Jul 13 '15 at 12:34
4

The following code will extract all the numbers for you:

text.scan(/(?<=[ ])[\d \-+()]+$|(?<=[ ])[\d \-+()]+(?=[ ]\w)/)

For the examples you supplied this results in:

["02031234567"]
["+442031234567"]
["+44 (0) 203 123 4567"]
["0207-123-4567"]
["02031234567", "+44207-1234567"]

To understand this regex, what we are matching is:

  • [\d \-+()]+ which is a sequence of one or more digits, spaces, minus, plus, opening or closing brackets (in any order - NB regex is greedy by default, so it will match as many of these characters next to each other as possible)
  • that must be preceded by a space (?<=[ ]) - NB the space in the positive look-behind is not captured, and therefore this makes sure that there are no leading spaces in the results
  • and is either at the end of the string $, or | is followed by a space then a word character (?=[ ]\w) (NB this lookahead is not captured)
RAWdaMedia
  • 94
  • 5
  • Should thenumber actually necessarily be preceded by a space? e.g. "Give me a call (07123456789)" the other piece is defeinitely an improvement though, I had to add a special piece to remove a leading colon (as many emails have something like "T: 0123456789") which was getting picked up by the previous one. – Carpela Jul 14 '15 at 11:20
  • Also, might want to add "." as a potential separator (apparently popular across the pond). Just added above. – Carpela Jul 14 '15 at 11:25
  • Also having the issue where we have a piece of punctuation followed by a phone number. e.g. "Telephone - 02071234567" Returns "- 02071234567" as the number. – Carpela Jul 14 '15 at 12:39
  • I tried to cover the cases that you gave as examples. I guess it is important to think of all the edge cases when trying to write tests... here is the quick code that I used to test whether the regex works: `samples = ["Give me a call (07123456789)","T: 0123456789"]` `for text in samples do` `puts text.scan(/(?<=[ \-(])[+\(\d][\d \-+()]+$|(?<=[ \-(])[+\(\d][\d \-+()]+(?=[ ]\w|[.\)])/).inspect` `end` NB some tweaking to get rid of a leading "-", but I will let you figure out how to get rid of brackets around the whole number if that is important to you. – RAWdaMedia Jul 16 '15 at 07:21
  • It is starting to be really confusing to look at, so you would want to have really good tests to make sure that it works. Ideally break your constructor of the regex into smaller steps that you could then comment properly, so that months down the track you would still know what is going on!!!! – RAWdaMedia Jul 16 '15 at 07:36
0

This pattern will get rid of the space but not match your third case with spaces:

/([^A-Z|^"|^\s]{6,})/i
Patrick Murphy
  • 2,311
  • 14
  • 17
0

This is what I came to in the end in case it helps somebody

numbers = text.scan(/([^A-Z|^"]{6,})/i).collect{|x| x[0].strip }

That gives me an array of

["+442031234567", "02031234567"]

I'm sure there is a more elegant way of doing this and possibly you'd want to check the numbers for likelihood of being phonelike - e.g. using the brilliant Phony gem.

numbers = text.scan(/([^A-Z|^"]{6,})/i).collect{|x| x[0].strip }
real_numbers = numbers.keep_if{|n| Phony.plausible? PhonyRails.normalize_number(n, default_country_code: "GB")}

Which should help exclude serial numbers or the like from being identified as numbers. You'll obviously want to change the country code to something relevant for you.

Carpela
  • 2,155
  • 1
  • 24
  • 55