I have a pandas dataframe with many rows. In each row I have an object and the duration of the machining on a certain machine (with a start time and an end time). Each object can be processed in several machines in succession. I need to find the actual duration of all jobs. For example:
Object | Machine | T start | T end |
---|---|---|---|
1 | A | 17:26 | 17:57 |
1 | B | 17:26 | 18:33 |
1 | C | 18:56 | 19:46 |
2 | A | 14:00 | 15:00 |
2 | C | 14:30 | 15:00 |
3 | A | 12:00 | 12:30 |
3 | C | 13:00 | 13:45 |
For object 1 the actual duration is 117 minutes,for object 2 is 60 minutes and for object 3 is 75 minutes. I tried with a groupby where I calculated the sum of the durations of the processes for each object and the minimum and maximum values, i.e. the first start and the last end. Then I wrote a function that compares these values but it doesn't work in case of object 1, and it works for object 2 and 3. Here my solution:
Object | min | max | sumT | LT_ACTUAL |
---|---|---|---|---|
1 | 17:26 | 19:46 | 148 | 140 ERROR! |
2 | 14:00 | 15:00 | 90 | 60 OK! |
3 | 12:00 | 13:45 | 75 | 75 OK! |
def calc_lead_time(min_t_start, max_t_end, t_sum):
t_max_min = (max_t_end - min_t_start) / pd.Timedelta(minutes=1)
if t_max_min <= t_sum:
return t_max_min
else:
return t_sum
df['LT_ACTUAL'] = df.apply(lambda x : calc_lead_time(x['min'], x['max'], x['sumT']), axis=1)