0

I'm trying to calc the different between a date and today in months.

Here is what I have so far:

import pandas as pd
import numpy as np
from datetime import date
def calc_date_countdown(df):
    today = date.today()
    df['countdown'] = df['date'].apply(lambda x: (x-today)/np.timedelta64(1, 'M'))
    df['countdown'] = df['countdown'].astype(int)
    return df

Any pointers on what I'm doing wrong or maybe a more efficient way of doing it?

When I run on my dataset, this is the error I'm getting: TypeError: unsupported operand type(s) for -: 'Timestamp' and 'datetime.date'

Mike Mann
  • 528
  • 4
  • 18

2 Answers2

2
import pandas as pd

def calc_date_countdown(df):
    today = pd.Timestamp.today()
    df['countdown'] = df['date'].apply(lambda x: (x - today).days // 30)
    return df

This should work as long as your date column in the dataframe is a Timestamp object. If it's not, you may need to convert it using pd.to_datetime() before running the function.

  • I discovered my pandas data was a timestamp. when I converted the `today` to a `pd.timestamp` it worked – Mike Mann Feb 15 '23 at 19:17
1

Using apply is not very efficient, as this is an array operation.

See the below example:

from datetime import date, datetime 
def per_array(df):
    df['months'] = ((pd.to_datetime(date.today()) - df['date']) / np.timedelta64(1, 'M')).astype(int)
    return df

def using_apply(df):
    today = date.today()
    df['months'] = df['date'].apply(lambda x: (x-pd.to_datetime(today))/np.timedelta64(1, 'M'))
    df['months'] = df['months'].astype(int)
    return df

df = pd.DataFrame({'date': [pd.to_datetime(f"2023-0{i}-01") for i in range(1,8)]})
print(df)
#         date
# 0 2023-01-01
# 1 2023-02-01
# 2 2023-03-01
# 3 2023-04-01
# 4 2023-05-01
# 5 2023-06-01
# 6 2023-07-01

Timing it:

%%timeit 
per_array(df)
195 µs ± 5.14 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%%timeit 
using_apply(df)
384 µs ± 3.22 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

As you can see, it is around twice as fast to not use apply.

JarroVGIT
  • 4,291
  • 1
  • 17
  • 29
  • awesome, thank you for sharing. Probably too broad of a question, but is the array operation a good replacement for `apply` overall? Or are the general scenarios that apply works better? I use the apply with lambda very often. – Mike Mann Feb 15 '23 at 19:26
  • 1
    That is a broad question, but yes: always try to avoid `apply` if it is not required, especially in big dataframes. It is much less efficient than using array operations. See this answer for a very clear explanation on this topic, I liked it a lot :) https://stackoverflow.com/a/54432584/1557060 – JarroVGIT Feb 15 '23 at 19:36