1

How to add/insert output of a function call that returns multiple fields, as new columns into Pandas dataframe ?

Sample code & data:

from pandas import DataFrame
People_List = [['Jon','Smith',21],['Mark','Brown',38],['Maria','Lee',42],['Jill','Jones',28],['Jack','Ford',55]]
df = DataFrame (People_List,columns=['First_Name','Last_Name','Age'])
print (df)


  First_Name Last_Name  Age
0        Jon     Smith   21
1       Mark     Brown   38
2      Maria       Lee   42
3       Jill     Jones   28
4       Jack      Ford   55


def getTitleBirthYear(df):
    if 'Maria' in df.First_Name:
        title='Ms'
    else:
        title='Mr' 
    current_year = int('2020')
    birth_year=''
    age = df.Age
    birth_year = current_year - age
    return title,birth_year

getTitleBirthYear(df)

  title birth_year
0 Mr    1999
1 Mr    1982
2 Ms    1978
3 Mr    1992
4 Mr    1965

final expected output:

  First_Name Last_Name  Age title   birth_year
0        Jon     Smith   21 Mr      1999
1       Mark     Brown   38 Mr      1982
2      Maria       Lee   42 Ms      1978
3       Jill     Jones   28 Mr      1992
4       Jack      Ford   55 Mr      1965

Please suggest. Thanks!

ManiK
  • 377
  • 1
  • 21

2 Answers2

2

Although you can apply, best is to use vectorized functions (see When should I (not) want to use pandas apply() in my code?). Your logic can be simplified as below:

print (df.assign(title=np.where(df["First_Name"].eq("Maria"), "Ms", "Mr"),
                 birth_year=pd.Timestamp.now().year-df["Age"])) # or 2020-df["Age"]

  First_Name Last_Name  Age title  birth_year
0        Jon     Smith   21    Mr        1999
1       Mark     Brown   38    Mr        1982
2      Maria       Lee   42    Ms        1978
3       Jill     Jones   28    Mr        1992
4       Jack      Ford   55    Mr        1965
Henry Yik
  • 22,275
  • 4
  • 18
  • 40
1

Here are two ways, apply and create the new columns

df[['title', 'birth_year']] = pd.DataFrame(df.apply(getTitleBirthYear, axis=1).tolist())

df[['title', 'birth_year']] = df.apply(getTitleBirthYear, axis=1, result_type='expand')

  First_Name Last_Name  Age title  birth_year
0        Jon     Smith   21    Mr        1999
1       Mark     Brown   38    Mr        1982
2      Maria       Lee   42    Ms        1978
3       Jill     Jones   28    Mr        1992
4       Jack      Ford   55    Mr        1965
Kenan
  • 13,156
  • 8
  • 43
  • 50
  • your welcome! does that completely answer your question? – Kenan Dec 02 '20 at 15:40
  • To take this question further:- what if my sample function takes two diff data-frames as arguments like for example - **getTitleBirthYear(df1, df2)** ? Can you please help how to use apply() in such case ? Same statement above with 2 args gives error as:- getTitleBirthYear() missing 1 required positional argument: 'df2'", 'occurred at index 0' – ManiK Dec 02 '20 at 17:49
  • this sounds like a more involved question. Can you create a new post with that question. Also will you need the output appended to df1/2. Which df is the apply placed on? – Kenan Dec 02 '20 at 18:23
  • posted a follow up question here:- [link](https://stackoverflow.com/questions/65122321/how-to-add-new-columns-into-a-new-dataframe-using-output-of-single-function-call) – ManiK Dec 03 '20 at 08:34