Pandas: Running a function to apply on a dataset

Question

I have been trying to figure out how to pass two variables (rows) to a function and get an output, but I have been having a lot of trouble with the grammar.

I've been banging my head against a wall all day; here is what I looked at already:

(I figured I was using apply wrong) Pandas: How to apply a function to different columns

Difference between map, applymap and apply methods in Pandas

I reread apply but it didn't help. I am working with the Titanic dataset (https://github.com/alexisperrier/packt-aml/blob/master/ch4/titanic.csv) and trying to replace empty ages with set numbers on the set. I tried two ways to do this:

Titanic.loc[(Titanic['pclass'] == 1) & (Titanic['age'].isnull()), 'age'] = 35
Titanic.loc[(Titanic['pclass'] == 2) & (Titanic['age'].isnull()), 'age'] = 25
Titanic.loc[(Titanic['pclass'] == 3) & (Titanic['age'].isnull()), 'age'] = 20

(This code worked just fine, replacing empty 'ages' with predetermined values). My first attempt though was to create a function and apply it. Function:

def ClassAge(age,pclass):
    if age.isnull:
        if pclass == 1:
            n = 35
        if pclass == 2:
            n = 25
        if pclass == 3:
            n = 20
    return(n)

I tried to apply it using this:

Titanic.age.apply(ClassAge,Titanic['pclass'], axis=1)

Output:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Based on what I read in the other answers I tried this, because apply assumes that rows are the input.

Titanic[['age','pclass']].apply(ClassAge)

Which gave me this:

TypeError: ("ClassAge() missing 1 required positional argument: 'pclass'", 'occurred at index age')

As mentioned above I did resolve the issue using .loc, but just for educational purposes I'd like to understand what it is I am doing either in writing the function, or calling it (or both potentially).

score 1 · Accepted Answer · answered Aug 10 '18 at 19:00

1

While applying lambda on row rather than passing the entire series of pclass just pass row value

df.apply(lambda x: ClassAge(x['age'],x['pclass']), axis=1)

answered Aug 10 '18 at 19:00

mad_

8,121
2
25
40

Thanks, after a little change to my core code this worked. (As I had to run line by line .isnull would no longer work; but I changed the code around a bit to make it function as expected. – Kafka Aug 10 '18 at 20:26

Pandas: Running a function to apply on a dataset

1 Answers1