1

I am trying to clean phone numbers using phonenumbers library. I created function to get country code & national number and store in columns 'country_code' and 'national_number'

I am trying to use apply() on dataframe which has noisy numbers. I am trying to use apply over loop due to performance gain. Below is code:

import phonenumbers
import pandas as pd
df_phone = pd.read_csv(r'D:\Code\Address-Nominatim\Address-Nominatim\Phone_Valid.csv',encoding='utf8')
df_phone['country_code'] = ''
df_phone['national_number'] = ''
df_phone['valid']=''

def phone_valid(phone):
    try:
        #print(phone['PHONE'] + " " + phone['COUNTRY'])
        x = phonenumbers.parse(phone['PHONE'],phone['COUNTRY'])
        df_phone['country_code'] = x.country_code
        df_phone['national_number'] = x.national_number
        df_phone['valid']=phonenumbers.is_possible_number(x)
    except:
        df_phone['country_code'] = "Error"
        df_phone['national_number'] = "Error"


df_phone=df_phone.apply(phone_valid,axis=1)

print(df_phone)

but dataframe df_phone only has none values.Below is sample output of df_phone

none none
1 none
2 none

Can someone tell me what mistake I am making?

Regards,

Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • 1
    You aren't supposed to assign to the dataframe within your `apply` function; you're just supposed to return a value. – AKX Jul 17 '22 at 18:43

1 Answers1

1

You aren't supposed to assign into the dataframe when you use apply. (Think of the case where you didn't actually even have access to the df_phone (global) variable.)

Instead, just return new values from apply so Pandas will assign them; as you need to return multiple columns, you'll need something like (self-contained example; replace phone_valid with your implementation):

import pandas as pd

df_phone = pd.DataFrame({
    'PHONE': ['100', '200', '300', '400', '500'],
    'COUNTRY': ['FI', 'US', 'SV', 'DE', 'FR'],
})


def parse(phone, country):
    return (phone * 3, country[::-1])


def phone_valid(phone):
    national, country = parse(phone['PHONE'], phone['COUNTRY'])
    return (national, country, True)


df_phone[['national', 'country', 'valid']] = df_phone.apply(phone_valid, axis=1, result_type="expand")

print(df_phone)

The output is

  PHONE COUNTRY   national country  valid
0   100      FI  100100100      IF   True
1   200      US  200200200      SU   True
2   300      SV  300300300      VS   True
3   400      DE  400400400      ED   True
4   500      FR  500500500      RF   True
AKX
  • 152,115
  • 15
  • 115
  • 172
  • Thanks AKX. This did trick. I modified code as per your suggestions & it worked. I still have some doubts, but I guess I need more practice. – MAYANK PANDE Jul 18 '22 at 08:15