2

Input df

Date1            
2019-01-23      
2020-02-01  

note: The type of Date1 is datetime64[ns].

Goal

I want to calculate month diff between Date1 column and '2019-01-01'.

Try and Ref

I try the answers from this post , but it failed as below:

df['Date1'].dt.to_period('M') - pd.to_datetime('2019-01-01').to_period('M') 
  • note:pandas version: 1.1.5
Jack
  • 1,724
  • 4
  • 18
  • 33

4 Answers4

2

Your solution should be changed by convert periods to integers and for second value is used one element list ['2019-01-01']:

df['new'] = (df['Date1'].dt.to_period('M').astype(int) - 
             pd.to_datetime(['2019-01-01']).to_period('M').astype(int))
print (df)
       Date1  new
0 2019-01-23    0
1 2020-02-01   13

If compare solutions:

rng = pd.date_range('1900-04-03', periods=3000, freq='MS')
df = pd.DataFrame({'Date1': rng})  


In [106]: %%timeit
     ...: date_ref = pd.to_datetime('2019-01-01')
     ...: df["mo_since_2019_01"] = (df.Date1.dt.year - date_ref.year).values*12 + (df.Date1.dt.month - date_ref.month)
     ...: 
1.57 ms ± 8.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [107]: %%timeit
     ...: df['new'] = (df['Date1'].dt.to_period('M').astype(int) - pd.to_datetime(['2019-01-01']).to_period('M').astype(int))
     ...: 
1.32 ms ± 19.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    

Apply are loops under the hood, so slowier:

In [109]: %%timeit
     ...: start = pd.to_datetime("2019-01-01")
     ...: df['relative_months'] = df['Date1'].apply(lambda end: relative_months(start, end, freq="M"))
     ...: 
25.7 s ± 729 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [110]: %%timeit
     ...: rd = df['Date1'].apply(lambda x:relativedelta(x,date(2019,1,1)))
     ...: mon = rd.apply(lambda x: ((x.years * 12) + x.months))
     ...: df['Diff'] = mon
     ...: 
94.2 ms ± 431 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

I think this should work:

date_ref = pd.to_datetime('2019-01-01')
df["mo_since_2019_01"] = (df.Date1.dt.year - date_ref.year).values*12 + (df.Date1.dt.month - date_ref.month)

month_delta = (date2.year - date1.year)*12 + (date2.month - date1.month)

output:

       Date1  mo_since_2019_01
0 2019-01-23                 0
1 2020-02-01                13
anon01
  • 10,618
  • 8
  • 35
  • 58
1

With this solution, you won't need further imports as it simply calculates the length of the pd.date_range() between your fixed start date and varying end date:

def relative_months(start, end, freq="M"):
    if start < end:
        x = len(pd.date_range(start=start,end=end,freq=freq))
    else:
        x = - len(pd.date_range(start=end,end=start,freq=freq))
    return x

start = pd.to_datetime("2019-01-01")
df['relative_months'] = df['Date1'].apply(lambda end: relative_months(start, end, freq="M"))

In your specific case, I think anon01's solution should be the quickest/ favorable; my variant however allows the use of generic frequency strings for date offsets like 'M', 'D', … and allows you to specifically handle the edge case of "negative" relative offsets (i.e. what happens if your comparison date is not earlier than all dates in Date1).

Asmus
  • 5,117
  • 1
  • 16
  • 21
0

Try:

rd = df['Date1'].apply(lambda x:relativedelta(x,date(2019,1,1)))
mon = rd.apply(lambda x: ((x.years * 12) + x.months))
df['Diff'] = mon

Input:

        Date1
0  2019-01-23
1  2020-02-01
2  2020-05-01
3  2020-06-01

Output:

       Date1  Diff
0 2019-01-23     0
1 2020-02-01    13
2 2020-05-01    16
3 2020-06-01    17
sharathnatraj
  • 1,614
  • 5
  • 14
  • Please remember to include the necessary imports, I'm _guessing_: `from dateutil.relativedelta import relativedelta` and `from datetime import date` – Asmus Dec 15 '20 at 07:47