0

I have phone number list on our PhoneNos table

ID | PhoneNo
1 | +61 2 9666 8000 

We try to search this phone no into our Content table (ie. desc field)

The challenge is actualy:

The desc field is a text and the input can be any thing such as:

ContentID | Desc    
1 | bla bla ... +61 (02) 9666 8000 ... bla bla
2 | bla bla ... +61-2-9666-8000 bla bla
3 | bla bla ... +61 2 96668000 bla bla
4 | bla bla ... +61296668000 00116129668000 bla bla

or could be anything arranging from extra spacing such as

5 | bla bla ... +61  (02) 9666   8000 ... bla bla
6 | bla bla ... +61-2 9662 0382 ... bla bla

That's an Australia phone number BUT again it could be USA or any other countries SO it's not tight with 1 particular country.

This phone no have no pattern what so ever before and after this phone no. So it could be anything.

Is there anyway to handle this sort of thing easily? I can probably construct each condition above BUT I'm just wondering if there is a better solution.

dcpartners
  • 5,176
  • 13
  • 50
  • 73

3 Answers3

2

Just normalize the users input to a format that is easy to search i.e. "+ [ x ]". If the user enters additional spaces remove them. Add country code if necessary. Remove 00 from start and replace with +. You could even split the 'phone number into three columns to make searching easier.

Ed Heal
  • 59,252
  • 17
  • 87
  • 127
1

Why not just remove special symbols from the phone numbers and store them as just number strings?

The only case you need to consider is the +, because it replaces 00.

So basically, your records will have just numbers, your input will have just numbers. Just make sure you normalize the + to something, both in your database and the input.

What I would do is store them all with 00 instead of a +, so that when a search input with 00 comes through, it will work, as well as a search with a +. Hope this makes sense.

bogdansrc
  • 1,338
  • 2
  • 15
  • 28
  • The input for Content table is something that we can't control .. so the user can type anything. One of paragraph might have the phone no. That phone no obviously can be any format. We need to search that content against our phone list no. – dcpartners Oct 26 '12 at 03:11
1

My (highly uneducated) thought would be to use a regular expression replace (see here). Essentially strip everything in the content except for numbers and plus signs (feeling clunky yet? :) ), and then compare to your control string with the same processing (\\+\d+, basically). That makes the rather broad assumption that there will be no false positives created by another random string of numbers/characters matching your number (I imagine somewhat unlikely from a probability perspective, but always a possibility).

I was tinkering around with what I'm sure is highly inefficient, inelegant and likely incorrect solution, and realized that it won't handle the case with a leading 0 inside parentheses (since this doesn't seem to be present in other patterns). You can find it here if you're curious, but I think the regex solution may be the most efficient way to handle.

Community
  • 1
  • 1
RocketDonkey
  • 36,383
  • 7
  • 80
  • 84
  • @RocketDonky: I like your 2nd solution. Didn't thought strip all down by removing space etc etc. The only challenge I guess to detect 0011 61 9666 8000 ... some people they put this dial out to overseas code 0011 (different for each country) but anyway at least it's close enough. – dcpartners Oct 26 '12 at 05:29
  • Actually, it won't matter!!! if you striped down so the text will be: "blabla00116196668000blabla" at least it will be pickup that string. It's going to slim chance anyway to have phone number that has 001161 for instance – dcpartners Oct 26 '12 at 05:31
  • @dcalliances Yeah, there are probably a number of corner cases that doesn't cover (is my ugly American-ness showing? :) ). I guess a potential solution could be done on the control string side? Maybe adjust that to account for the various overseas code and then check those instances against the content? – RocketDonkey Oct 26 '12 at 05:32