Reporting in pandas - cumulative sum of open issues

Question

I have a list of tickets with data on: ticket name, created date, status, closed date.

A new column will be calculated based on created / closed date. If a new ticket with the status = open is created that month, the new column value will increase by one. Value will decrease by one on the month the ticket is moved to a closed status.

How would I set up a Df with the index described above, and how would I go about doing the cumulative calculation in pandas? I'm specifially struggling with setting index as a time series of dates and and having the issues show up on the correct row Starting Data:

          ID        Created Date Closed Date
0     FND-1974 2021-10-18 00:00:00  2022-03-31
1    FND-10310 2021-10-18 00:00:00  2022-03-31
2    FND-10310 2021-10-18 00:00:00  2022-03-31
3    FND-10310 2021-07-21 00:00:00         NaT
4     FND-9862 2021-07-20 00:00:00  2022-02-28
..         ...                 ...         ...
100         41 2020-04-13 13:34:39         NaT
101         40 2020-04-13 13:32:14         NaT
102         35 2020-04-01 17:48:23         NaT
103         18 2020-01-21 16:08:54         NaT
104          4 2020-02-25 14:56:37         NaT

Current approach:

    df = pd.DataFrame(index= pd.Series(pd.date_range('2021-7-1', dt.date.today(),freq="D")))
    df['ID'] = df_agg['Exception_ID']
    df['Created Date'] = df_agg['Created_On_Date']
    df['Closed Date'] = df_agg['Closed_Date']
    df['count'] = 0
    for index, row in df.iterrows():
        if index >= row['Created Date']:
            row['count'] += 1
        if index >= row['Closed Date']:
            row['count'] -= 1
    print(df.head)

Output:

ID Created Date Closed Date  count
2021-07-01  NaN          NaT         NaT      0
2021-07-02  NaN          NaT         NaT      0
2021-07-03  NaN          NaT         NaT      0
2021-07-04  NaN          NaT         NaT      0
2021-07-05  NaN          NaT         NaT      0
        ...          ...         ...    ...
2022-03-20  NaN          NaT         NaT      0
2022-03-21  NaN          NaT         NaT      0
2022-03-22  NaN          NaT         NaT      0
2022-03-23  NaN          NaT         NaT      0
2022-03-24  NaN          NaT         NaT      0

Obviously, I would like to populate the row when the ID, Created Date, Closed Date and add 1 to our count when an issue has been opened on a specific day. im losing it trying to figure this out

This can be achieved using the groupby and count methods. Give it a try and provide your data and code. We’re not going to write it for you! — Bill, Mar 23 '22 at 22:07
Will do when I get home. Just wanted to get the text part of the question out to be productive while commuting. I’ll give it a shot and report back — WAHISHON, Mar 23 '22 at 22:09
Also, did you check for existing questions asking the same thing or similar? This is a very common operation. — Bill, Mar 23 '22 at 22:49
I did. could not find anything to get cumulative sums based off of dates — WAHISHON, Mar 24 '22 at 13:20

Bill · Accepted Answer · 2022-03-25T18:29:31.063

Count number of events by month as described here:

How to group and count rows by month and year using Pandas

Then calculate a cumulative sum using cumsum.

import numpy as np
import pandas as pd

# Prepare some data
dates = np.random.choice(pd.date_range('2020-01-01', '2021-10-31'), size=100)
data = {'ID': [f"FND-{i}" for i in range(100)], 
        'Created Date': dates,
        'Closed Date': dates + pd.to_timedelta(np.random.poisson(60, size=100), unit='D')}
df_agg = pd.DataFrame(data)
# Add some NaT values
df_agg.loc[df_agg['Closed Date'] > '2021-10-31', 'Closed Date'] = None

# Make a dataframe of monthly stats
index = pd.period_range('2021-01', '2021-10', freq='M', name='Month')
monthly_summary = pd.DataFrame(index=index)
monthly_summary['Opened'] = df_agg['ID'].groupby(df_agg['Created Date'].dt.to_period('M')).count()
monthly_summary['Closed'] = df_agg['ID'].groupby(df_agg['Closed Date'].dt.to_period('M')).count()
monthly_summary = monthly_summary.fillna(0).astype(int)
monthly_summary['Net Change'] = monthly_summary['Opened'] - monthly_summary['Closed']

# Calculate cumulative sum of open issues
start_count = 50
monthly_summary['Month-end Count'] = start_count + monthly_summary['Net Change'].cumsum()
print(monthly_summary)

         Opened  Closed  Net Change  Month-end Count
Month                                               
2021-01       2       7          -5               45
2021-02       2       6          -4               41
2021-03      11       2           9               50
2021-04      11       3           8               58
2021-05       6      11          -5               53
2021-06       3      10          -7               46
2021-07       5       5           0               46
2021-08       1       4          -3               43
2021-09       6       5           1               44
2021-10       4       1           3               47

Hi Bill, used your answer but went in a bit of a different direction. ``` — WAHISHON, Mar 28 '22 at 15:48

score 0 · Answer 2 · answered Mar 28 '22 at 15:57

Bills answer provides a great tabular view of the data, and I would recommend using that.

 I was able to get by with the below:
df = pd.DataFrame()
    df['ID'] = df_agg['Exception_ID']
    df['Created Date'] = df_agg['Created_On_Date']
    df['Closed Date'] = df_agg['Closed_Date']
    df = df['Created Date'].groupby([df['Created Date'].dt.month, df['Created Date'].dt.year]).agg('count')
    df = df.to_frame()
    df['date'] = df.index
    dates = df['date']
    date_format = []
    for i in dates:
        value = month_list[i[0]]+', '+str(i[1])
        date_format.append(value)
    df['dates1'] = date_format

output:

Created Date Created Date                          
1            2020                     1   (1, 2020)
             2021                     2   (1, 2021)
             2022                     3   (1, 2022)
2            2020                     1   (2, 2020)
             2021                     6   (2, 2021)
             2022                     1   (2, 2022)
3            2021                     7   (3, 2021)
4            2020                     9   (4, 2020)
             2021                     3   (4, 2021)
5            2020                     2   (5, 2020)
             2021                     6   (5, 2021)
6            2020                     4   (6, 2020)
             2021                     2   (6, 2021)
7            2020                     3   (7, 2020)
             2021                     8   (7, 2021)
8            2020                     2   (8, 2020)
             2021                     5   (8, 2021)
9            2020                     4   (9, 2020)
10           2020                     6  (10, 2020)
             2021                     8  (10, 2021)
11           2020                     5  (11, 2020)
             2021                    10  (11, 2021)
12           2018                     1  (12, 2018)
             2020                     3  (12, 2020)
             2021                     3  (12, 2021)

Reporting in pandas - cumulative sum of open issues

2 Answers2