Objective: I need to show the trend in ageing of issues. e.g. for each date in 2021 show the average age of the issues that were open as at that date.
Starting data (Historic issue list):. "df"
ref | Created | resolved |
---|---|---|
a1 | 20/4/2021 | 15/5/2021 |
a2 | 21/4/2021 | 15/6/2021 |
a3 | 23/4/2021 | 15/7/2021 |
Endpoint: "df2"
Date | Avg_age |
---|---|
1/1/2021 | x |
2/1/2021 | y |
3/1/2021 | z |
where x,y,z are the calculated averages of age for all issues open on the Date.
Tried so far: I got this to work in what feels like a very poor way.
- create a date range (pd.date_range(start,finish,freq="D")
- I loop through the dates in this range and for each date I filter the "df" dataframe (boolean filtering) to show only issues live on the date in question. Then calc age (date - created) and average for those. Each result appended to a list.
- once done I just convert the list into a series for my final result, which I can then graph or whatever.
hist_dates = pd.date_range(start="2021-01-01",end="2021-12-31"),freq="D")
result_list = []
for each_date in hist_dates:
f1=df.Created < each_date #filter1
f2=df.Resolved >= each_date #filter2
df['Refdate'] = each_date #make column to allow refdate-created
df['Age']= (df.Refdate - df.Created)
results_list.append(df[f1 & f2]).Age.mean())
Problems: This works, but it feels sloppy and it doesn't seem fast. The current data-set is small, but I suspect this wouldn't scale well. I'm trying not to solve everything with loops as I understand it is a common mistake for beginners like me.