Number of months between two dates while one date is given

Question

Input df

Date1            
2019-01-23      
2020-02-01

note: The type of Date1 is datetime64[ns].

Goal

I want to calculate month diff between Date1 column and '2019-01-01'.

Try and Ref

I try the answers from this post , but it failed as below:

df['Date1'].dt.to_period('M') - pd.to_datetime('2019-01-01').to_period('M')

note:pandas version: 1.1.5

Did you convert the `Date1` to an actual date type with `pd.to_datetime()`? — Paulo Marques, Dec 15 '20 at 00:15
@Jack - So you want to ignore the Year and only calculate the month difference? For.e.g month difference between 2020-02-01 & 2019-01-01 is '1'? — sharathnatraj, Dec 15 '20 at 00:34
@sharathnatraj no, it must conclude year. For your example, it will be 13. — Jack, Dec 15 '20 at 00:35

jezrael · Accepted Answer · 2020-12-15T08:19:45.753

Your solution should be changed by convert periods to integers and for second value is used one element list ['2019-01-01']:

df['new'] = (df['Date1'].dt.to_period('M').astype(int) - 
             pd.to_datetime(['2019-01-01']).to_period('M').astype(int))
print (df)
       Date1  new
0 2019-01-23    0
1 2020-02-01   13

If compare solutions:

rng = pd.date_range('1900-04-03', periods=3000, freq='MS')
df = pd.DataFrame({'Date1': rng})  


In [106]: %%timeit
     ...: date_ref = pd.to_datetime('2019-01-01')
     ...: df["mo_since_2019_01"] = (df.Date1.dt.year - date_ref.year).values*12 + (df.Date1.dt.month - date_ref.month)
     ...: 
1.57 ms ± 8.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [107]: %%timeit
     ...: df['new'] = (df['Date1'].dt.to_period('M').astype(int) - pd.to_datetime(['2019-01-01']).to_period('M').astype(int))
     ...: 
1.32 ms ± 19.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Apply are loops under the hood, so slowier:

In [109]: %%timeit
     ...: start = pd.to_datetime("2019-01-01")
     ...: df['relative_months'] = df['Date1'].apply(lambda end: relative_months(start, end, freq="M"))
     ...: 
25.7 s ± 729 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [110]: %%timeit
     ...: rd = df['Date1'].apply(lambda x:relativedelta(x,date(2019,1,1)))
     ...: mon = rd.apply(lambda x: ((x.years * 12) + x.months))
     ...: df['Diff'] = mon
     ...: 
94.2 ms ± 431 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

anon01 · Answer 2 · 2020-12-15T00:35:56.033

1

I think this should work:

date_ref = pd.to_datetime('2019-01-01')
df["mo_since_2019_01"] = (df.Date1.dt.year - date_ref.year).values*12 + (df.Date1.dt.month - date_ref.month)

month_delta = (date2.year - date1.year)*12 + (date2.month - date1.month)

output:

       Date1  mo_since_2019_01
0 2019-01-23                 0
1 2020-02-01                13

edited Dec 15 '20 at 00:35

answered Dec 15 '20 at 00:28

anon01

10,618
8
35
58

I don't understand your answer , it seems not matching my question. – Jack Dec 15 '20 at 00:31

Asmus · Answer 3 · 2020-12-15T08:01:34.967

With this solution, you won't need further imports as it simply calculates the length of the pd.date_range() between your fixed start date and varying end date:

def relative_months(start, end, freq="M"):
    if start < end:
        x = len(pd.date_range(start=start,end=end,freq=freq))
    else:
        x = - len(pd.date_range(start=end,end=start,freq=freq))
    return x

start = pd.to_datetime("2019-01-01")
df['relative_months'] = df['Date1'].apply(lambda end: relative_months(start, end, freq="M"))

In your specific case, I think anon01's solution should be the quickest/ favorable; my variant however allows the use of generic frequency strings for date offsets like 'M', 'D', … and allows you to specifically handle the edge case of "negative" relative offsets (i.e. what happens if your comparison date is not earlier than all dates in Date1).

score 0 · Answer 4 · answered Dec 15 '20 at 00:43

0

Try:

rd = df['Date1'].apply(lambda x:relativedelta(x,date(2019,1,1)))
mon = rd.apply(lambda x: ((x.years * 12) + x.months))
df['Diff'] = mon

Input:

        Date1
0  2019-01-23
1  2020-02-01
2  2020-05-01
3  2020-06-01

Output:

       Date1  Diff
0 2019-01-23     0
1 2020-02-01    13
2 2020-05-01    16
3 2020-06-01    17

answered Dec 15 '20 at 00:43

sharathnatraj

1,614
5
14

Please remember to include the necessary imports, I'm _guessing_: `from dateutil.relativedelta import relativedelta` and `from datetime import date` – Asmus Dec 15 '20 at 07:47

Number of months between two dates while one date is given

Input df

Goal

Try and Ref

4 Answers4