Is there a way to add invalid phone numbers conditions in my code

Question

I have the following code in python(pandas), databricks. This is working fine but it is not filtering out the invalid phone numbers.

The code follows the pattern and filters out home and mobile phone numbers

import pandas as pd 
import re
from pyspark.sql.functions import lit

df = Phonevalidation

# function to check the phone number pattern
def isValid(s): 
  Pattern = re.compile("(0|44)?[7-9][0-9]{9}") 
  if(Pattern.match(s)):
    return 'Mobile Number'
  else: return 'Home phone'

#UDF Register
PhType = udf(isValid)

df1 = Phonevalidation.withColumn('Phtype' ,PhType('Phonenumber') )
display(df1)

I am expecting to filter out invalid phone number with length >10 or <10 or numbers like 0000000 or 11111 to be tagged as invalid

score 0 · Accepted Answer · answered Aug 20 '19 at 09:17

The code you are currently using marks with 9 digits and leading zero or UK countrycode and then a initial 7, 8 or 9 as mobile number, but everything else (including malformated ones) as home number:

  Pattern = re.compile("(0|44)?[7-9][0-9]{9}") 
  if(Pattern.match(s)):
    return 'Mobile Number'
  else: return 'Home phone'

If you are after US numbers, grep with regex for phone number might help.

I am expecting to filter out invalid phone number with length >10 or <10 or numbers like 0000000 or 11111 to be tagged as invalid

For the first part of your idea you can use as pattern like Pattern = re.compile("[0-9]{10}"), the 2nd part I would put into a pseudocode like

if (Pattern.match(s)):
   if (s != '0000000000' or s != '1111111111'):
      return: 'Fitting your criteria'
else: return 'Not valid'

Is there a way to add invalid phone numbers conditions in my code

1 Answers1