2

I have a data:

ID closed set message_time
aaa 22:00 19:00 19:05
aaa 22:00 19:30 19:40
aaa 22:00 20:00 21:00
bbb 23:00 22:00 22:10
ccc 22:00 19:00 19:05
ccc 22:00 19:30 19:40

code is here:

import pandas as pd
df = pd.DataFrame({
    'ID': ['aaa', 'aaa', 'aaa', 'bbb', 'ccc', 'ccc'],
    'closed': ['22:00', '22:00', '22:00', '23:00', '22:00', '22:00'],
    'set': ['19:00', '19:30', '20:00', '22:00', '19:00', '19:30'],
    'message_time': ['19:05', '19:40', '21:00', '22:10', '19:05', '19:40']
})

I need to create new column named "newtime" based on several conditions:

We have temp_df for each of id, so length of temp_df where id = aaa is 3, for bbb is 1, for ccc is 2;
if length of temp_df == 1 then newtime = closed - set
if length of temp_df > 1 then
     for rows > 0 and < length
         newtime = set[row+1] - set[row]
     if row == length then newtime = closed[row] - set[row]

What is the best way ro realize that?

P. S. The needed output is:

ID closed set message_time newtime
aaa 22:00 19:00 19:05 30 min
aaa 22:00 19:30 19:40 30 min
aaa 22:00 20:00 21:00 120 min
bbb 23:00 22:00 22:10 60 min
ccc 22:00 19:00 19:05 30 min
ccc 22:00 19:30 19:40 150 min
luibrain
  • 63
  • 7

1 Answers1

1

Convert to_timedelta and fillna the last row per group with the difference:

c = pd.to_timedelta(df['closed']+':00')
s = pd.to_timedelta(df['set']+':00')

df['newtime'] = s.groupby(df['ID']).diff(-1).mul(-1).fillna(c-s)

Output:

    ID closed    set message_time         newtime
0  aaa  22:00  19:00        19:05 0 days 00:30:00
1  aaa  22:00  19:30        19:40 0 days 00:30:00
2  aaa  22:00  20:00        21:00 0 days 02:00:00
3  bbb  23:00  22:00        22:10 0 days 01:00:00
4  ccc  22:00  19:00        19:05 0 days 00:30:00
5  ccc  22:00  19:30        19:40 0 days 02:30:00

If you need output as minutes:

df['newtime'] = (s.groupby(df['ID']).diff(-1).mul(-1)
                  .fillna(c-s)
                  .dt.total_seconds().div(60)
                )

Output:

    ID closed    set message_time  newtime
0  aaa  22:00  19:00        19:05     30.0
1  aaa  22:00  19:30        19:40     30.0
2  aaa  22:00  20:00        21:00    120.0
3  bbb  23:00  22:00        22:10     60.0
4  ccc  22:00  19:00        19:05     30.0
5  ccc  22:00  19:30        19:40    150.0
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Thanks! But it is not correct for index 0 and index 4, it shoulde be 30 min instead of 3 hours. for the first row of each id it counts as set[row+1] - set[row] if length >1 – luibrain Apr 05 '23 at 09:17
  • OK, I assumed otherwise you wrote `rows > 0` not `rows >= 0`. I'll update ;) – mozway Apr 05 '23 at 09:22
  • it works, thank you a lot! but don't you know, is it possible to replace calendar time to business time using https://pypi.org/project/business-duration/? – luibrain Apr 05 '23 at 09:32
  • I'm not sure I get your remark – mozway Apr 05 '23 at 09:34
  • fillna(c-s) is using difference between 2 to_timedelta in calendar hours / minutes, is it possible to get the difference in business hours / minutes using this library? – luibrain Apr 05 '23 at 09:50
  • @luibrain got it, yes probably but this should be a separate question. Please give it a try first. Note that with your current data this would not be possible as you're missing the date information. Also see [this Q/A](https://stackoverflow.com/questions/41543747/finding-time-difference-including-business-days-and-business-hours-in-pandas) – mozway Apr 05 '23 at 09:53