-1

i have df like this

   Contact Number
0   
1   NaN
2   6363887122.0
3   6363887122.0

I WANT THIS

    Contact Number  Status_contactNUmber   Invalid_contactNUmber
0                       Blank/Null                 True
1   NaN                 Blank/Null           True             
2   6363887122           Valid               False
3   6363887122           Valid               False

I try with this

def contactNumber(ele):
    if (pd.isna(ele) or (ele=='')):
        return ("Blank/Null",True)
    elif re.search(r'^([0]|\+91)?[6789]\d{9}$',ele):
#     elif ele.str.contains(r'^([0]|\+91)?[6789]\d{9}$'):
        return ("Valid",False)
    else:
        return ("invalid",True)
df[['Status_contactNUmber','Invalid_contactNUmber']] = df['Contact Number'].apply(contactNumber).tolist()

but give the Error because Contact Number column in Float type

  • Use `df["Contact Number"].astype(int)` to get those values as integers. – Shubham May 29 '21 at 06:17
  • for null/blank not work –  May 29 '21 at 06:18
  • This has previously been answered on SO [here](https://stackoverflow.com/questions/41550746/error-using-astype-when-nan-exists-in-a-dataframe/41550787). I would suggest you use the `fillna()` method to replace NaNs with invalid values. Or you can use pandas' **Int64** which does allow NaNs as mentioned in the post linked – Shubham May 29 '21 at 06:21

2 Answers2

0

please change your column type string from float

0

First of all, the 0 you see at top of your df's index is actually the name of the index, and not the first row. First row in your df starts from index = 1 (the NaN value). You can also understand it by the fact that if Contact Number column is of type Float, then how can it have a "" value? (It will have NaN, just like it has there at index = 1).

I confirmed this by copying your df and checking its index (see the name '0' and index starting from 1):

>>> df = pd.read_clipboard('\s\s+')
>>> df.index
Int64Index([1, 2, 3], dtype='int64', name='0')

So now coming to what you want, you can do this by handling it in your function. Just convert the ele to int type first to remove .0 from phone numbers and then convert to str type for regex matching:

def contactNumber(ele):
    if (pd.isna(ele)):
        return ("Blank/Null",True)
    elif re.search(r'^([0]|\+91)?[6789]\d{9}$', str(int(ele))):
#     elif ele.str.contains(r'^([0]|\+91)?[6789]\d{9}$'):
        return ("Valid",False)
    else:
        return ("invalid",True)

You don't need the (ele=='') condition, because as stated, float type columns will not have blank strings.

Output:

>>> df[['Status_contactNUmber','Invalid_contactNUmber']] = df['Contact Number'].apply(contactNumber).tolist()
>>> df
   Contact Number Status_contactNUmber  Invalid_contactNUmber
0                                                            
1             NaN           Blank/Null                   True
2      6363887122              Valid                    False
3      6363887122              Valid                    False
Ank
  • 1,704
  • 9
  • 14