I have been trying to figure out how to pass two variables (rows) to a function and get an output, but I have been having a lot of trouble with the grammar.
I've been banging my head against a wall all day; here is what I looked at already:
(I figured I was using apply wrong) Pandas: How to apply a function to different columns
Difference between map, applymap and apply methods in Pandas
I reread apply but it didn't help. I am working with the Titanic dataset (https://github.com/alexisperrier/packt-aml/blob/master/ch4/titanic.csv) and trying to replace empty ages with set numbers on the set. I tried two ways to do this:
Titanic.loc[(Titanic['pclass'] == 1) & (Titanic['age'].isnull()), 'age'] = 35
Titanic.loc[(Titanic['pclass'] == 2) & (Titanic['age'].isnull()), 'age'] = 25
Titanic.loc[(Titanic['pclass'] == 3) & (Titanic['age'].isnull()), 'age'] = 20
(This code worked just fine, replacing empty 'ages' with predetermined values). My first attempt though was to create a function and apply it. Function:
def ClassAge(age,pclass):
if age.isnull:
if pclass == 1:
n = 35
if pclass == 2:
n = 25
if pclass == 3:
n = 20
return(n)
I tried to apply it using this:
Titanic.age.apply(ClassAge,Titanic['pclass'], axis=1)
Output:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Based on what I read in the other answers I tried this, because apply assumes that rows are the input.
Titanic[['age','pclass']].apply(ClassAge)
Which gave me this:
TypeError: ("ClassAge() missing 1 required positional argument: 'pclass'", 'occurred at index age')
As mentioned above I did resolve the issue using .loc, but just for educational purposes I'd like to understand what it is I am doing either in writing the function, or calling it (or both potentially).