Dataframe to start with:
df = pd.DataFrame({
# some ways to create random data
'User Story Id':np.random.choice( ['US111111','US111112','US111113','US222221','US222222','US222223'], 6,replace=False),
'Feature Id':np.random.choice( ['F999999','F888888'], 6),
'Sprint Label':np.random.choice( ['ABC 23.1.1'], 6),
'Team name':np.random.choice( ['panda','python','shark'], 6),
# a date range and set of random dates
'story_being_groomed_ts':pd.date_range('1/10/2023 8:07:21 AM', periods=6, freq='D'),
'story_in_progress_ts':np.random.choice( pd.date_range('1/10/2023 8:07:21 AM', periods=10,
freq='D'), 6, replace=False),
'story_complete_ts':np.random.choice( pd.date_range('1/20/2023 8:07:21 AM', periods=10,
freq='D'), 6, replace=False),
'story_accepted_ts':np.random.choice( pd.date_range('1/30/2023 8:07:21 AM', periods=10,
freq='D'), 6, replace=False),
'story_release_to_prod_ts':np.random.choice( pd.date_range('2/10/2023 8:07:21 AM', periods=10,
freq='D'), 6, replace=False)
})
I want to add three additional columns:
'Lead_time_1': Count weekdays between 'story_in_progress_ts' and 'story_being_groomed_ts'
'Lead_time_2': Count weekdays between 'story_accepted_ts' and 'story_in_progress_ts'
'Lead_time_3': Count weekdays between 'story_release_to_prod_ts' and 'story_accepted_ts'
Each User Story Id has one set of date-timestamp columns. For each User Story Id, I want to calculate the Lead times as mentioned above:
Key row conditions that need to be met for the solution to work are below.
- Exclude weekend days from lead time calculation
- If for Lead_time_3 calc the 'story_release_to_prod_ts' is blank (meaning story is accepted but not release to production) then Lead_time_3 calc should be 'blank'
- If for Lead_time_2 calc the 'story_accepted_ts' is blank (meaning story is in progress but not accepted by product owner) then Lead_time_2 calc should be 'blank'
- If for Lead_time_1 calc the 'story_in_progress_ts' is blank (meaning story is being groomed but not made in progress by developer) then Lead_time_1 calc should be 'blank'
- Lead_time_1 | Lead_time_2 | Lead_time_3 calcs should be blank if the story does not have valid date time stamp under 'story_in_progress_ts' column.
My expected result:
User Story Id | Feature Id | Sprint Label | Team name | story_being_groomed_ts | story_in_progress_ts | story_complete_ts | story_accepted_ts | story_release_to_prod_ts | Lead Time 1 |Lead Time 2 | Lead Time 3 |
I have tried building the Datetime difference column using the below.
df[['story_being_groomed_ts','story_in_progress_ts']] = df[['story_being_groomed_ts','story_in_progress_ts']].apply(pd.to_datetime) #if conversion required
df['Lead Time'] = (df['story_in_progress_ts'] - df['story_being_groomed_ts']).dt.days