0

I have a dataframe like so:

            df
Id    Severity    First Discovered
0      Low            1/1/2021
1      Medium         1/1/2021
2      Medium         1/1/2021

I've also defined a function below that assists in creating "Target Close Date" which adds a certain number of days to the "First Discovered" field depending on what the corresponding "Severity" value is.

def get_target_close_date(severity, first_discovered_date):
    '''Adds days to first discovered date depending on severity'''
    if severity == 'Low':
        target_close_date = first_discovered_date + timedelta(days=30)
    elif severity == 'Medium':
        target_close_date = first_discovered_date + timedelta(days=60)

    return target_close_date

By executing df['Target Close Date'] = df.apply(lambda row: get_target_close_date(row['Severity'], row['First Discovered']), axis=1), the dataframe updates correctly:

         df
Id    Severity    First Discovered    Target Close Date
0      Low            1/1/2021             1/31/2021
1      Medium         1/1/2021             3/2/2021
2      Medium         1/1/2021             3/2/2021

However, if the dataframe is empty, the code does not work, and I get a ValueError: Wrong number of items passed 3, placement implies 1. I ideally want to add an if else statement to the lambda function to check if dataframe is empty, something like:

df['Target Close Date'] = df.apply(
  lambda row: get_target_close_date(row['Severity'], row['First Discovered']) if not df.empty else pass, 
    axis=1)

This keeps returning a syntax error however. I would prefer to write the if else conditional inside the lambda function rather than doing an if else statement that spans across multiple lines.

wns2rx
  • 39
  • 5
  • 1
    What do you expect `... else pass` to *do*? Conditional expressions have to evaluate to *something*, so maybe `... else None`? In any case, why do you want to use a lambda expression specifically? – juanpa.arrivillaga Feb 16 '21 at 18:56
  • Good point. I guess I just don't want anything to happen (i.e., if its an empty dataframe, don't create the target close date column because that is making it fail). To your second point, the lambda function is used many times throughout the script on many dataframes each with a different name. Finding and replacing what's inside the lambda function, at least to me, would be easier than creating a multi line if else statement that would need to reference the dataframe's name multiple times. – wns2rx Feb 16 '21 at 19:49

2 Answers2

0

One helpful change might be to use some builtin pandas tools rather than a custom function and lambda expression. For example, you could make a map:

d = {'Low':pd.Timedelta(days=30),
     'Medium':pd.Timedelta(days=60)}

df['Target Close Date'] = df['First Discovered'] + df['Severity'].map(d)

Giving:

  Severity First Discovered Target Close Date
0      Low       2021-01-01        2021-01-31
1   Medium       2021-01-01        2021-03-02
2   Medium       2021-01-01        2021-03-02

Now for handling empty DataFrames. If the df has the correct columns (Severity, First Discovered), this will work (Target Close Date is added as an empty column). If the df is totally empty (i.e. no named columns), this will raise an error. But you could add a simple check:

if df.empty:
    pass
else:
    df['Target Close Date'] = df['First Discovered'] + df['Severity'].map(d)

And you could further wrap this in a function to apply to many different DataFrames:

def process(df):
    d = {'Low':pd.Timedelta(days=30),
         'Medium':pd.Timedelta(days=60)}

    if df.empty:
        pass
    else:
        df['Target Close Date'] = df['First Discovered'] + df['Severity'].map(d)

for df in df_list:
    process(df)

I think the main benefit here is getting to use builtin functions rather than relying on a loop; it should be much more efficient with larger datasets (see here for example).

Tom
  • 8,310
  • 2
  • 16
  • 36
0

Using the built-in Pandas tools and map functionality led to a warning about PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized, but it appeared to have worked. However, I believe the problem was inside the lambda function; the reference to the dataframe could not be understood, so I actually just needed to check if the row itself was empty. The code below fixed the issue.

df['Target Close Date'] = df.apply(
  lambda row: None if row.empty else get_target_close_date(row['Severity'], row['First Discovered']),
    axis=1)
wns2rx
  • 39
  • 5