0

I have a dataframe with 3 columns. The 1st column named "insured_relationship" takes the values ['own_child', 'wife', 'husband', 'unmarried', 'other_relationship']. The 2nd column, named "insured_sex", takes the values ['Male', 'Female']. The third one, named "incident_hour_of_the_day", takes the integer values [0,1,2,3,.....,23]. I created a function (with the name mar_status()) with some conditions on the 3 columns in order to create a new column(variable) named "marital_status" in the dataframe. But when I use the "apply" method on the 3 columns I get the following error message regarding my function :

TypeError: mar_status() got multiple values for argument 'col1'

I must indicate that col1 is the "insured_relationship" column of my dataframe.

This is the function which I created:

def mar_status(col1,col2,col3):
    if col1 == 'own_child' and col3 in range(13) :
        return 'unmarried'
    elif col1 == 'own_child' and col3 in range(13,24) :
        return 'divorced'
    elif col1 == 'wife' and col2 == 'Male' :
        return 'married'
    elif col1 == 'husband' and col2 == 'Female' :
        return 'married'
    elif col1 == 'unmarried':
         return 'unmarried'
    elif col1 == 'other_relationship':
        return 'in relationship'
    elif col1 == 'out_of_family' and col2 == 'Male':
        return 'widower'
    elif col1 == 'out_of_family' and col2 == 'Female' :
        return 'widow'

And the "apply" method:

df['marital_status'] = df.apply(mar_status,col1 ='insured_relationship',col2 ='insured_sex',
                                col3 = 'incident_hour_of_the_day',axis = 1)

I expected to create a new variable named "marital_status" which takes the values : ['unmarried', 'divorced', 'married', in_relationship', 'widow', 'widower'].

The function itself works but when I apply it to the dataframe doesn't. How can I achieve the desired outcome?

Georgios
  • 3
  • 2
  • You should use a lambda function - this answer sets out how to do it https://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe – bn_ln Dec 20 '22 at 08:42

1 Answers1

0

Try it like this:

df['marital_status'] = df.apply(lambda x: mar_status(x['insured_relationship'], x['insured_sex'], x['incident_hour_of_the_day']), axis=1)
insured_relationship insured_sex incident_hour_of_the_day marital_status
own_child Female 2 unmarried
Elodin
  • 386
  • 1
  • 10
  • Thank you, but it returns only the value "unmarried". All other outputs are "None". – Georgios Dec 20 '22 at 09:43
  • Can you please provide some sample data? – Elodin Dec 20 '22 at 09:44
  • ,insured_relationship,insured_sex,incident_hour_of_the_day 0,own-child,MALE,4 1,not-in-family,MALE,5 2,unmarried,FEMALE,16 3,other-relative,FEMALE,5 4,husband,FEMALE,4 – Georgios Dec 20 '22 at 10:21
  • Make sure that your variables in the data or in the function add up. See for example in function its searched for `Male` and it looks like that in your data it's written `MALE`. It should word if the data points are written the same way as in the function. – Elodin Dec 20 '22 at 11:01
  • Your answer was correct, I had some combinations that I haven't took into account into my function and that's why the code wasn't running properly. Thanks again!! – Georgios Dec 20 '22 at 11:43