1

I have phone number field to clean up and trying to find the junk number which are in this format patterns of numbers: like 000000 ,111111,222222,3333333,4444444,....

Phone
----------------
(444)333-7777
555.666.6666
333.555-7777
222-222-2222
1111111111
5104554535
(555)(353)(5343)

Phone(output i want to see is)

222-222-2222
1111111111

and second question i want to know the count of the records has special characters.from the above phone field i want to get 5 as count.

Your help is appreciated.

Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58
swathi
  • 377
  • 2
  • 7
  • 13
  • 1
    what are the valid patterns? can you please explain it clearly? – Vamsi Prabhala Jun 23 '16 at 20:24
  • i am not sure about the valid patterns but i am sure if i have patterns like 1111111111 or 2222222222 or 3333333333 or (444)4444444....9999999999 or 0000000000... i am sure i need to check how many are like this in the table and eventually need to clean them.and any with special characters as well. – swathi Jun 23 '16 at 20:27
  • If your objective is to 'clean up' why not just remove non-numeric characters? Also, would you consider 8888888888 a 'junk' number? (It's valid) – Nick Jun 23 '16 at 20:40
  • Yes we can remove non numeric characters but as i mentioned my first criteria is to find the count of the records are like 111111111 or 2222222222 or 3333333333 or (444)4444444....9999999999 or 0000000000... Thanks for responding Nicarus. – swathi Jun 23 '16 at 20:45

4 Answers4

2

This works to find ten digit repeating numbers

    WITH cleaned_numbers AS (
      select regexp_replace(number, '[^0-9]', '', 'g') AS number
      from phone_numbers
    )
    select number
    from cleaned_numbers
    where number::text ~ '([0-9])(\1)(\1)(\1)(\1)(\1)(\1)(\1)(\1)(\1)'
Shayna
  • 716
  • 7
  • 10
  • Hi Shayna, i am not expecting the results i want can you please take a look at this. drop table if exists t1; create table t1 (phone text); insert into t1 (phone) values ('000-000-000'), ('(000)000-0000'), ('000-000-0000'), ('0000000001'),('0000000000'),('765-456-6656'); select * from t1; WITH cleaned_numbers AS ( select regexp_replace(Phone, '[^0-9]', '', 'g') AS phone from t1 ) select phone from cleaned_numbers where phone::text ~ '([0-9])(\1)(\1)(\1)(\1)(\1)(\1)(\1)(\1)(\1)' and phone <>'' order by phone I dont want value 765-456-6656 – swathi Jun 23 '16 at 21:35
  • @swathi I just ran your exact query and 765-456-6656 is not returned. Only the three ten digit repeating 0 numbers are returned.The query returns: phone ------------ 0000000000 0000000000 0000000000 (3 rows) – Shayna Jun 23 '16 at 21:42
  • i would like to get ('000-000-000') and ('0000000001') as well . because these are junk. – swathi Jun 23 '16 at 21:46
  • What is this statement doing? phone::text ~ '([0-9])(\1)(\1)(\1)(\1)(\1)(\1)(\1)(\1)' – swathi Jun 23 '16 at 22:04
  • @swathi the only definition I have seen for what you mean by junk numbers so far is "phone numbers which has all 10 digits same numbers" can you clarify what your definition of "junk numbers" is so I can cover all cases? – Shayna Jun 23 '16 at 22:05
  • Here are some junk data i see in the table Phone ----- 000-000-000 0000000000 (000)000-0000 000-000-0000 0000000001 0000000021 0000000441 and can you explain what does this condition does? phone::text ~ '([0-9])(\1)(\1)(\1)(\1)(\1)(\1)(\1)(\1)' Thank You,Swathi. – swathi Jun 24 '16 at 14:15
  • @swathi The ([0-9]) creates a group for a single digit and each (\1) refers to the preceding group, so it checks each time to see if the preceding digit is followed by the same value. So this regular expression only checks for ten digit repeating numbers. I can see that that the examples you have provided definitely look like junk phone numbers, but in order to write a regular expression that covers all cases you need to have a well-defined definition of a junk number. Can you explain what all of these examples have in common to make them identifiable as junk numbers? – Shayna Jun 24 '16 at 15:20
  • Thank you for helping me. I don't have any specific rules to confirm them as junk but i do know by seeing them they are junk. I assume if i have 1st 3 digits with different number and rest of the 7digits are same them that is valid number(2036666666).this can be a valid phone 8561111111, 8777777777,8595555555 even 18888888888 is valid phone number. – swathi Jun 24 '16 at 15:59
  • @swathi check out the second answer here for help on building your own custom regular expression for phone number validation to meet your needs: http://stackoverflow.com/questions/123559/a-comprehensive-regex-for-phone-number-validation – Shayna Jun 24 '16 at 16:25
0

Extending on @Shayna's excellent response, you can find the first part of your question using a regex looking for repeating numbers. Change the number between the braces to check for varying numbers of repeating digits:

with
  t1(phone) as (values ('000-000-000'::text), ('(000)000-0000'), ('000-000-0000'), ('0000000001'),('0000000000'),('765-456-6656')),
  cleaned_numbers AS ( select phone, regexp_replace(Phone, '[^0-9]', '', 'g') AS numeric_phone from t1 ) 
select * 
from cleaned_numbers 
where numeric_phone ~ '([0-9])\1{8}'
order by phone

Second part of your question, again use a regex to identify phones that have anything other than a digit or space (remove the space between brackets to find also phone numbers that have spaces, or add any symbols that you would consider valid, like brackets and dashes):

with
  t1(phone) as (values ('000-000-000'::text), ('(000)000-0000'), ('000-000-0000'), ('0000000001'),('0000000000'),('765-456-6656'))
select count(*)
from t1
where phone ~ '[^0-9 ]'
Ezequiel Tolnay
  • 4,302
  • 1
  • 19
  • 28
0

fnPhoneFix function created to fix all common phone numbers errors to and normalizes phone numbers.

Function removes all Non-Numeric characters, reduces above 10 digit numbers to 10, below 10 numbers set to '' and all repeating numbers like 2222222222 set to ''. These are the most common errors needing to fix a phone number improperly screened.

To Create the fnPhoneFix Function use the following code:

CREATE FUNCTION [dbo].[fnPhoneFix](@PhoneOld VarChar(20))
    Returns VarChar(10)
AS
    Begin
    Declare @iCnt Int = 0
    Declare @PhoneNew VarChar(15)  = ''

    IF @PhoneOld IS NULL
        RETURN NULL;

    While @iCnt <= LEN(@PhoneOld)
        Begin
            IF Substring(@PhoneOld,@iCnt,1) >= '0' AND Substring(@PhoneOld,@iCnt,1) <= '9'
            Begin
                SET @PhoneNew = @PhoneNew + Substring(@PhoneOld,@iCnt,1)
            End
            Set @iCnt = @iCnt + 1
        End

    If LEN(@PhoneNew) > 10 and Left(@PhoneNew, 1) = '1'
        Set @PhoneNew = RIGHT(@PhoneNew,10);
    Else
        Set @PhoneNew = Left(@PhoneNew,10);

    If @PhoneNew = '0000000000' or @PhoneNew = '1111111111' 
            or @PhoneNew = '2222222222' or @PhoneNew = '3333333333' or @PhoneNew = '4444444444' 
            or @PhoneNew = '5555555555' or @PhoneNew = '6666666666' or @PhoneNew = '7777777777' 
            or @PhoneNew = '8888888888' or @PhoneNew = '9999999999' or LEN(@PhoneNew) <> 10
        Set @PhoneNew ='';
    Return @PhoneNew
End

You can add formatting of @PhoneNew if you want to set numbers to a standardized Phone Format or leave as just as 10 digit numbers.

If @PhoneNew = ''
    Return Null;
Else
    Return (Left(@PhoneNew, 3) + '-' + Substring(@PhoneNew,4,3) + '-' + Right(@PhoneNew, 4));

A sample query to Update all phone numbers in a Table by using the Function. Can also be used to update a single record, but most often used to clean up a table. Then delete all Null or '' records in table and then doing a simple count query would give you a count of good phone numbers.

Print'/*Fix Phone Numbers Call*/'
Update tblTemp
    Set Phone = dbo.fnPhoneFix(tblTemp.Phone)
From tblTemp
Crazyd
  • 406
  • 2
  • 7
  • 15
-1

You can specify the junk phone numbers by using LIKE clause

i.e.

WHERE phone like ('%.%')

^ This would retrieve all junk phone numbers containing "." containing any characters % before or any characters % after

Ariel
  • 928
  • 4
  • 15
  • 32
  • Yes if i give like'%.%' i will just get the records having "." but my requirement is to get all invalid numbers like 11111111111,2222222222,3333333333.... LIKE wont work. Thanks for trying. – swathi Jun 23 '16 at 20:44
  • @swathi - It's because your question is not very clear. What exactly makes '33333333333' an invalid phone number? – Nick Jun 23 '16 at 20:45
  • @Nicarus : in my criteria we need to strip out the phone numbers which has all 10 digits same numbers. We consider such numbers as junk. – swathi Jun 23 '16 at 20:48