Let's say I have some customer data. The data is generated by the customer and it's messy, so they put their city into either the city or county field or both! That means I may need to check both columns to find out which city they are from.
mydf = pd.DataFrame({'name':['jim','jon'],
'city':['new york',''],
'county':['','los angeles']})
print(mydf)
name city county
0 jim new york
1 jon los angeles
And I am using an api to get their zipcode. There is a different api function for each city, and it returns the zipcode for the customer's address, e.g. 123 main stret, new york
. I haven't included the full address here to save time.
# api for new york addresses
def get_NY_zipcode_api():
return 'abc123'
# api for chicago addresses
def get_CH_zipcode_api():
return 'abc124'
# api for los angeles addresses
def get_LA_zipcode_api():
return 'abc125'
# api for miami addresses
def get_MI_zipcode_api():
return 'abc126'
Depending on the city, I will call a different api. So for now, I am checking if city == x or county ==x, call api_x
:
def myfunc(row):
city = row['city']
county = row['county']
if city == 'chicago' or county == 'chicago':
# call chicago api
zipcode = get_CH_zipcode_api()
return zipcode
elif city == 'new york' or county == 'new york':
# call new york api
zipcode = get_NY_zipcode_api()
return zipcode
elif city == 'los angeles' or county == 'los angeles':
# call los angeles api
zipcode = get_LA_zipcode_api()
return zipcode
elif city == 'miami' or county == 'miami':
# call miami api
zipcode = get_MI_zipcode_api()
return zipcode
And I apply()
this to the df and get my results:
mydf['result'] = mydf.apply(myfunc,axis=1)
print(mydf)
name city county result
0 jim new york abc123
1 jon los angeles abc125
I actually have about 30 cities and therefore 30 conditions to check, so I want to avoid a long list of elif
statments. What would be the most efficient way to do this?
I found some suggestions from a similar stack overflow question. Such as creating a dictionary with key:city
and value:function
and calling it based on city:
operationFuncs = {
'chicago': get_CH_zipcode_api,
'new york': get_NY_zipcode_api,
'los angeles': get_LA_zipcode_api,
'miami': get_MI_zipcode_api
}
But as far as I can see this only works if I am checking a single column / single condition. I can't see how it can work with if city == x or county == x