update column values based on length temp dataframes based on source dataframe

Question

I have a data:

ID	closed	set	message_time
aaa	22:00	19:00	19:05
aaa	22:00	19:30	19:40
aaa	22:00	20:00	21:00
bbb	23:00	22:00	22:10
ccc	22:00	19:00	19:05
ccc	22:00	19:30	19:40

code is here:

import pandas as pd
df = pd.DataFrame({
    'ID': ['aaa', 'aaa', 'aaa', 'bbb', 'ccc', 'ccc'],
    'closed': ['22:00', '22:00', '22:00', '23:00', '22:00', '22:00'],
    'set': ['19:00', '19:30', '20:00', '22:00', '19:00', '19:30'],
    'message_time': ['19:05', '19:40', '21:00', '22:10', '19:05', '19:40']
})

I need to create new column named "newtime" based on several conditions:

We have temp_df for each of id, so length of temp_df where id = aaa is 3, for bbb is 1, for ccc is 2;
if length of temp_df == 1 then newtime = closed - set
if length of temp_df > 1 then
     for rows > 0 and < length
         newtime = set[row+1] - set[row]
     if row == length then newtime = closed[row] - set[row]

What is the best way ro realize that?

P. S. The needed output is:

ID	closed	set	message_time	newtime
aaa	22:00	19:00	19:05	30 min
aaa	22:00	19:30	19:40	30 min
aaa	22:00	20:00	21:00	120 min
bbb	23:00	22:00	22:10	60 min
ccc	22:00	19:00	19:05	30 min
ccc	22:00	19:30	19:40	150 min

Please provide the exact expected output for clarity – mozway Apr 05 '23 at 08:58 — mozway, Apr 05 '23 at 08:58

mozway · Accepted Answer · 2023-04-05T09:25:43.933

1

Convert to_timedelta and fillna the last row per group with the difference:

c = pd.to_timedelta(df['closed']+':00')
s = pd.to_timedelta(df['set']+':00')

df['newtime'] = s.groupby(df['ID']).diff(-1).mul(-1).fillna(c-s)

Output:

    ID closed    set message_time         newtime
0  aaa  22:00  19:00        19:05 0 days 00:30:00
1  aaa  22:00  19:30        19:40 0 days 00:30:00
2  aaa  22:00  20:00        21:00 0 days 02:00:00
3  bbb  23:00  22:00        22:10 0 days 01:00:00
4  ccc  22:00  19:00        19:05 0 days 00:30:00
5  ccc  22:00  19:30        19:40 0 days 02:30:00

If you need output as minutes:

df['newtime'] = (s.groupby(df['ID']).diff(-1).mul(-1)
                  .fillna(c-s)
                  .dt.total_seconds().div(60)
                )

Output:

    ID closed    set message_time  newtime
0  aaa  22:00  19:00        19:05     30.0
1  aaa  22:00  19:30        19:40     30.0
2  aaa  22:00  20:00        21:00    120.0
3  bbb  23:00  22:00        22:10     60.0
4  ccc  22:00  19:00        19:05     30.0
5  ccc  22:00  19:30        19:40    150.0

edited Apr 05 '23 at 09:25

answered Apr 05 '23 at 09:06

mozway

194,879
13
39
75

Thanks! But it is not correct for index 0 and index 4, it shoulde be 30 min instead of 3 hours. for the first row of each id it counts as set[row+1] - set[row] if length >1 – luibrain Apr 05 '23 at 09:17
OK, I assumed otherwise you wrote `rows > 0` not `rows >= 0`. I'll update ;) – mozway Apr 05 '23 at 09:22
it works, thank you a lot! but don't you know, is it possible to replace calendar time to business time using https://pypi.org/project/business-duration/? – luibrain Apr 05 '23 at 09:32
I'm not sure I get your remark – mozway Apr 05 '23 at 09:34
fillna(c-s) is using difference between 2 to_timedelta in calendar hours / minutes, is it possible to get the difference in business hours / minutes using this library? – luibrain Apr 05 '23 at 09:50
@luibrain got it, yes probably but this should be a separate question. Please give it a try first. Note that with your current data this would not be possible as you're missing the date information. Also see [this Q/A](https://stackoverflow.com/questions/41543747/finding-time-difference-including-business-days-and-business-hours-in-pandas) – mozway Apr 05 '23 at 09:53

update column values based on length temp dataframes based on source dataframe

1 Answers1

Linked