0

I have a solved question update column values based on length temp dataframes based on source dataframe

there is a code:

import pandas as pd
df = pd.DataFrame({
    'ID': ['aaa', 'aaa', 'aaa', 'bbb', 'ccc', 'ccc'],
    'closed': ['2023-03-28 22:00', '2023-03-28 22:00', '2023-03-28 22:00', '2023-03-29 23:00', '2023-03-27 22:00', '2023-03-27 22:00'],
    'set': ['2023-03-27 19:00', '2023-03-28 19:30', '2023-03-28 20:00', '2023-03-27 22:00', '2023-03-25 19:00', '2023-03-26 19:30'],
    'message_time': ['19:05', '19:40', '21:00', '22:10', '19:05', '19:40']
})

df['newtime'] = (s.groupby(df['ID']).diff(-1).mul(-1)
                  .fillna(c-s)
                  .dt.total_seconds().div(60)
                )

output is differnce in minutes (fillna(c-s)). I need to replace this with business minutes. I tried but it didn't work. For example, create new column simply using module business_duration:

import holidays as pyholidays
from datetime import time, datetime

holidaylist_RU = pyholidays.Russia(years=[datetime.now().year, datetime.now().year-1])
start_hour = time(10, 0, 0)
end_hour = time(21, 0, 0)
unit_min='min'

DATA_REACTION['reaction (minutes)'] = DATA_REACTION.apply(lambda x: bd.businessDuration(datetime.strptime(x['start'], '%Y-%m-%d %H:%M:%S'), datetime.strptime(x['end'], '%Y-%m-%d %H:%M:%S'), start_hour, end_hour, holidaylist=holidaylist_RU, unit=unit_min), axis=1)

It works for direct applying. How implement this solution to fillna row?

luibrain
  • 63
  • 7
  • 2
    As indicated in your other question, you can't compute business datetime without the date (which you have not here). For instance if a day is a weekend or holiday it won't count the same – mozway Apr 05 '23 at 11:29
  • @mozway updated the code with datetimes – luibrain Apr 05 '23 at 11:33
  • @mozway I have a suggestion. Instead of difference in column 'newtime' can we create the two columns named newtime_1 and newtime_2 with c and s respectively. After that I can find the bussiness time between these 2 columns. – luibrain Apr 05 '23 at 11:41

1 Answers1

0
import numpy as np
import pandas as pd
import business_duration as bd
import holidays as pyholidays
from datetime import time, datetime

holidaylist_RU = pyholidays.Russia(years=[datetime.now().year, datetime.now().year-1])
start_hour = time(10, 0, 0)
end_hour = time(21, 0, 0)
unit_min='min'

df = pd.DataFrame({
    'ID': ['aaa', 'aaa', 'aaa', 'bbb', 'ccc', 'ccc'],
    'closed': ['2023-03-28 22:00', '2023-03-28 22:00', '2023-03-28 22:00', '2023-03-29 23:00', '2023-03-27 22:00', '2023-03-27 22:00'],
    'set': ['2023-03-27 19:00', '2023-03-28 19:15', '2023-03-28 20:00', '2023-03-27 22:00', '2023-03-25 19:00', '2023-03-26 19:30'],
    'message_time': ['19:05', '19:40', '21:00', '22:10', '19:05', '19:40']
})

df['set'] = pd.to_datetime(df['set'])
df['closed'] = pd.to_datetime(df['closed'])
df = df.sort_values(['ID', 'set'])
df['newtime_1'] = df.groupby('ID')['set'].shift(-1).fillna(df['closed'])
df['in work (minutes)'] = df.apply(lambda x: bd.businessDuration(x['set'], x['newtime_1'], start_hour, end_hour, holidaylist=holidaylist_RU, unit=unit_min), axis=1)

Found a solution!

luibrain
  • 63
  • 7