0

Dataframe to start with:

df = pd.DataFrame({

    # some ways to create random data
    'User Story Id':np.random.choice( ['US111111','US111112','US111113','US222221','US222222','US222223'], 6,replace=False),
    'Feature Id':np.random.choice( ['F999999','F888888'], 6),
    'Sprint Label':np.random.choice( ['ABC 23.1.1'], 6),
    'Team name':np.random.choice( ['panda','python','shark'], 6),


    # a date range and set of random dates
    'story_being_groomed_ts':pd.date_range('1/10/2023  8:07:21 AM', periods=6, freq='D'),
    'story_in_progress_ts':np.random.choice( pd.date_range('1/10/2023  8:07:21 AM', periods=10,
                          freq='D'), 6, replace=False),
    'story_complete_ts':np.random.choice( pd.date_range('1/20/2023  8:07:21 AM', periods=10,
                          freq='D'), 6, replace=False),
    'story_accepted_ts':np.random.choice( pd.date_range('1/30/2023  8:07:21 AM', periods=10,
                          freq='D'), 6, replace=False),
    'story_release_to_prod_ts':np.random.choice( pd.date_range('2/10/2023  8:07:21 AM', periods=10,
                          freq='D'), 6, replace=False)
    })

I want to add three additional columns:

'Lead_time_1': Count weekdays between 'story_in_progress_ts' and 'story_being_groomed_ts'

'Lead_time_2': Count weekdays between 'story_accepted_ts' and 'story_in_progress_ts'

'Lead_time_3': Count weekdays between 'story_release_to_prod_ts' and 'story_accepted_ts'

Each User Story Id has one set of date-timestamp columns. For each User Story Id, I want to calculate the Lead times as mentioned above:

Key row conditions that need to be met for the solution to work are below.

  • Exclude weekend days from lead time calculation
  • If for Lead_time_3 calc the 'story_release_to_prod_ts' is blank (meaning story is accepted but not release to production) then Lead_time_3 calc should be 'blank'
  • If for Lead_time_2 calc the 'story_accepted_ts' is blank (meaning story is in progress but not accepted by product owner) then Lead_time_2 calc should be 'blank'
  • If for Lead_time_1 calc the 'story_in_progress_ts' is blank (meaning story is being groomed but not made in progress by developer) then Lead_time_1 calc should be 'blank'
  • Lead_time_1 | Lead_time_2 | Lead_time_3 calcs should be blank if the story does not have valid date time stamp under 'story_in_progress_ts' column.

My expected result:

User Story Id | Feature Id | Sprint Label | Team name | story_being_groomed_ts | story_in_progress_ts | story_complete_ts | story_accepted_ts | story_release_to_prod_ts | Lead Time 1 |Lead Time 2 | Lead Time 3 |

I have tried building the Datetime difference column using the below.

df[['story_being_groomed_ts','story_in_progress_ts']] = df[['story_being_groomed_ts','story_in_progress_ts']].apply(pd.to_datetime) #if conversion required
df['Lead Time'] = (df['story_in_progress_ts'] - df['story_being_groomed_ts']).dt.days
Leo82
  • 3
  • 4
  • 1
    Does this answer your question? [how to use (np.busday\_count) with pandas.core.series.Series](https://stackoverflow.com/questions/54234021/how-to-use-np-busday-count-with-pandas-core-series-series) – Michael Cao Mar 07 '23 at 16:02
  • Hi Michael ,thanks for sharing. I went thru the above post. It does not answer my question. Side-bar: Do you know how to format the expected output, when I paste a table of expected output, the table header and row values align properly. I tried formatting this mutiple times, could not get it right. – Leo82 Mar 07 '23 at 16:25

1 Answers1

0

You can use pd.apply row-wise, and generate the 3 columns you need. You can use dropna on the subset of columns to make sure that all the values that are not dates will be removed.

I used the data you proposed, but i inserted a null value in there to check if it would work properly.

df.iloc[3,5] = ''

df['lead_time_1'] = df[['story_being_groomed_ts', 'story_in_progress_ts']].dropna().apply(lambda x: pd.date_range(x[0], x[1], freq='B').shape[0],
                                                                                                                       axis=1)

df['lead_time_2'] = df[['story_in_progress_ts', 'story_accepted_ts']].dropna().apply(lambda x: pd.date_range(x[0], x[1], freq='B').shape[0],
                                                                                                                       axis=1)

df['lead_time_3'] = df[['story_accepted_ts', 'story_release_to_prod_ts']].dropna().apply(lambda x: pd.date_range(x[0], x[1], freq='B').shape[0],
                                                                                                                       axis=1)


>>> df.fillna('')[['lead_time_1', 'lead_time_2', 'lead_time_3']] 
# only printing part of the df otherwise is messy to paste here

  lead_time_1 lead_time_2  lead_time_3
0         1.0        22.0            8
1         1.0        16.0            8
2         2.0        14.0            7
3                                   12
4         2.0        15.0            6
5         1.0        15.0            7
Yolao_21
  • 765
  • 1
  • 4
  • 10