-1

I have a dataframe which looks like this:

A  B   Start_Date
1  4   2003-05-22
2  6   2003-05-31
....
57 406 2018-09-08

I want to get the value which is at or after a few years from the Start_Date. For instance I want to know the value of column B which will be at a date less than or equal to 10 years from the Start_Date for the corresponding value. So this will look something like this:

A  B   Start_Date D
1  4   2003-05-22 <value of B on or before (last value before) 2013-05-22>
2  6   2003-05-31 <value of B on or before (last value before) 2013-05-31>
....
57 406 2018-09-08 <value of B on or before (last value before) 2028-09-08>

When I try something like this ('Start_Date plus 10' is just another column with 10 years added to the Start_Date column)

df['D']=df[df['Start Date']<=df['Start_Date plus 10']]['B'].max()

It just gives out the maximum value for column B which is understandable, however not my end objective. Please help with suggestions on this. Please let me know if there is ambiguity in the question or if anything needs to be clarified further. Thank you for taking the time to read this and answer the question.

Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • does that mean you have for example dates between 2000 and 2040 and you would like to get all entries where the date is 2020 +/- 10 years ? – s.blnc Jul 07 '22 at 18:06
  • Hi, thank you reading and reaching out. What you're saying is correct. However I wanted to add that the dates won't be continuous (although we can make it that way by inserting dates in between) and I would only need the entries either 10 years from or before the start date. – Gopal Sharma Jul 08 '22 at 05:15
  • have you seen the solution I added? does this help? – s.blnc Jul 12 '22 at 00:31
  • Hi, my apologies for the late reply. I've found another way for this. Will share soon – Gopal Sharma Jul 27 '22 at 03:54

1 Answers1

0

I am not sure if this is exactly what you need, let me know if it does not work. But for example if you have a DataFrame like:

tempDF = pd.DataFrame({'dates': ['2003-05-20', 
                                 '2003-05-21', 
                                 '2003-05-22', 
                                 '2003-05-23', 
                                 '2003-05-24', 
                                 '2003-05-25']})

and you define your dates like:

   min_date = '2003-05-21'
   max_date = '2003-05-23'

you have different options. You can either use somehing getting firstly all entries over a specific date and then again use a subsample of that by filtering all unter a specific date.

   filteredDF = tempDF[tempDF['dates']>=min_date][tempDF['dates'] <=max_date]

or you could use the 'query' function (like it is explained here here

   filteredDF =tempDF.query('dates >= @min_date').query('dates <= @max_date')
s.blnc
  • 76
  • 6