this is continuation to below post:
If a function returns multiple fields from two different arguments, how to use apply() or add them altogether in a new pandas dataframe ?
Sample code:
from pandas import DataFrame
People_List = [['Jon','Smith',21],['Mark','Brown',38],['Maria','Lee',42],['Jill','Jones',28],['Jack','Ford',55]]
df1 = DataFrame (People_List,columns=['First_Name','Last_Name','Age'])
Address_List = [['Jon','Chicago'],['Mark','SFO'],['Maria','Chicago'],['Jill','Chicago'],['Jack','Chicago']]
df2 = DataFrame(Address_List,columns=['First_Name', 'City'])
print (df1, df2)
First_Name Last_Name Age
0 Jon Smith 21
1 Mark Brown 38
2 Maria Lee 42
3 Jill Jones 28
4 Jack Ford 55
First_Name City
0 Jon Chicago
1 Mark SFO
2 Maria Chicago
3 Jill Chicago
4 Jack Chicago
def getTitleBirthYear(df1, df2):
if 'Maria' in df1.First_Name:
title='Ms'
else:
title='Mr'
current_year = int('2020')
birth_year=''
age = df1.Age
birth_year = current_year - age
if 'Chicago' in df2.City:
state='IL'
else:
state='Other'
return title,birth_year,state
#return {'title':title,'birth_year':birth_year, 'state':state}
getTitleBirthYear(df1,df2)
title birth_year state
0 Mr 1999 IL
1 Mr 1982 Other
2 Ms 1978 IL
3 Mr 1992 IL
4 Mr 1965 IL
df = DataFrame.merge(df1,df2,on='First_Name',how='inner')
print(df)
First_Name Last_Name Age City
0 Jon Smith 21 Chicago
1 Mark Brown 38 SFO
2 Maria Lee 42 Chicago
3 Jill Jones 28 Chicago
4 Jack Ford 55 Chicago
df['title', 'birth_year', 'state'] = pd.DataFrame(df.apply(getTitleBirthYear,axis=1).tolist())
However, getting below error: TypeError: ("getTitleBirthYear() missing 1 required positional argument: 'df2'", 'occurred at index 0')
final expected output:
First_Name Last_Name Age City title birth_year state
0 Jon Smith 21 Chicago Mr 1999 IL
1 Mark Brown 38 SFO Mr 1982 Other
2 Maria Lee 42 Chicago Ms 1978 IL
3 Jill Jones 28 Chicago Mr 1992 IL
4 Jack Ford 55 Chicago Mr 1965 IL