-1

I have 2 dataframes in question - 1 contains the officersID and officer names in a company:

officer_df = pd.DataFrame({'officerID': ['01', '02', '03'], 'Name': ['Tom', 'Dick', 'Harry']})

and the other contains the officersID and leave dates should they have applied for leave:

 df_officer_leave = pd.DataFrame({'officerID': ['01', '01'], 'leave start date': ['2020-12-15', '2020-12-31'], 'leave end date': ['2020-12-16', '2021-01-02']})

Now I want to use a function leave_col_set to compare the officerID in my officer_df and compare with df_officer_leave to return a list of [leave start dates, leave end dates] and add the returned list as a new column to the officer_df based on officerID but I keep having an error.

I am at a loss and therefore come to stack overflow for guidance. Thank you kind souls in advance.

import pandas as pd
    officer_df = pd.DataFrame({'officerID': ['01', '02', '03'], 'Name': ['Tom', 'Dick', 'Harry']})
    
    
    df_officer_leave = pd.DataFrame({'officerID': ['01', '01'], 'leave start date': ['2020-12-15', '2020-12-31'], 'leave end date': ['2020-12-16', '2021-01-02']})
    
    df_officer_leave['leave start date']= pd.to_datetime(df_officer_leave['leave start date'])
    df_officer_leave['leave end date']= pd.to_datetime(df_officer_leave['leave end date'])
    
    def leave_col_set(x, df_officer_leave):
            return [*df_officer_leave[df_officer_leave['officerID']==x][['leave start date', 'leave end date']].values.tolist()]
        #leave logic
        
    
    
    officer_df["leaveDays"] = officer_df.officerID.apply(leave_col_set, args=(df_officer_leave))

gtomer
  • 5,643
  • 1
  • 10
  • 21

1 Answers1

0

The correct syntax for your last line is:

officer_df["leaveDays"] = officer_df.officerID.apply(leave_col_set, args=(df_officer_leave, ))

For more information see: Pass Dataframe to Apply function pandas as argument

That being said I would highly consider not passing in a whole dataframe as an argument, especially if you are just extracting info.

In your case the following would suffice as you can simply access df_officer_leave from inside the function:

import pandas as pd

officer_df = pd.DataFrame({'officerID': ['01', '02', '03'], 
                           'Name': ['Tom', 'Dick', 'Harry']})
df_officer_leave = pd.DataFrame({'officerID': ['01', '01'], 
                                 'leave start date': ['2020-12-15', '2020-12-31'], 
                                 'leave end date': ['2020-12-16', '2021-01-02']})

df_officer_leave['leave start date']= pd.to_datetime(df_officer_leave['leave start date'])
df_officer_leave['leave end date']= pd.to_datetime(df_officer_leave['leave end date'])

def leave_col_set(x):
    return [*df_officer_leave[df_officer_leave['officerID']==x][['leave start date', 'leave end date']].values.tolist()]
    
officer_df['leaveDays'] = officer_df.officerID.apply(leave_col_set)
Loic RW
  • 444
  • 3
  • 7
  • Wow thank you Loic RW! My problem has been solved and it is great to be able to learn something new. Thank you for your time and valuable information. – Adeline Sidik Feb 16 '21 at 01:22
  • Glad to help! Please remember to accept the answer if you found it helpful. – Loic RW Feb 16 '21 at 09:46