repeated pandas series based on time

Question

I have a pandas data frame that looks something like:

import pandas as pd
import numpy as np

d={'original tenor':[10,10,10,10,10,10,10,10,10,10,10],\
'residual tenor':[5,4,3,2,1,10,9,8,7,6,5],\
'date':(['01/01/2018','02/01/2018','03/01/2018','04/01/2018','05/01/2018','06/01/2018','07/01/2018','08/01/2018','09/01/2018','10/01/2018','11/01/2018'])\
}
df=pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'],format='%d/%m/%Y')

df

The residual tenor reduces based on the date. When residual tenor goes to 1, the next residual tenor is then the original tenor. I am trying to get a formula to populate the residual tenor given the original tenor and residual tenor. So, given the following data frame, I would expect the NaN to be replaced by 5

d={'original tenor':[10,10],\
'residual tenor':[5,np.nan],\
'date':(['01/01/2018','11/01/2018'])\
}
df=pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'],format='%d/%m/%Y')

df

If you just want to fill NaN-s with previous values - that's the answer: https://stackoverflow.com/questions/27905295/how-to-replace-nans-by-preceding-values-in-pandas-dataframe (you may have to do sorting by date first to ensure previous - is really previous) — Grzegorz Skibinski, Aug 29 '19 at 09:41

nostradamus · Accepted Answer · 2019-08-29T14:40:51.913

Had to read it a few times, but I guess, the following code will produce the desired output:

import pandas as pd
import numpy as np

d={'original tenor':[10,10],\
'residual tenor':[5,np.nan],\
'date':(['01/01/2018','11/01/2018'])\
}
df=pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'],format='%d/%m/%Y')

df['residual tenor'][1:]=(df['residual tenor'][0]-(df['date'][1:]-df['date'][0])/np.timedelta64(1,'D'))%10

df

numpy is only needed here to convert the time difference into days.

Edit regarding the OP's comment:

Are you familiar with the modulo operation (% in Python)? It's often useful if numbers repeat in some way... A little mind-twisting leads to the following code for another stop value:

import pandas as pd
import numpy as np

d={'original tenor':[10, 10, 10, 10, 10, 10],\
'residual tenor':[5, np.nan, np.nan, np.nan, np.nan, np.nan],\
'date':(['01/01/2018', '03/01/2018', '04/01/2018', '05/01/2018', '06/01/2018', '11/01/2018'])\
}

df=pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y')

stoptenor=2
df['residual tenor'][1:]=(df['residual tenor'][0]-(df['date'][1:]-df['date'][0])/np.timedelta64(1,'D')-stoptenor)%(11-stoptenor)+stoptenor

df

Because your pattern is still repeated, but has a different "offset" (stoptenor), we have to tweak the modulo accordingly. For improved clarity, I increased the number of datapoints.

I was wondering how the formula you suggested would change if the tenor was forced to stop at a particular number. For example, if I set the minimum at 2, the tenor goes from 10,9,8,7,6,5,4,3,2,10,9,8,7,6,5,4,3,2,10,9..... etc... — Number Logic, Aug 29 '19 at 14:02

repeated pandas series based on time

1 Answers1