1

I have a data frame with a column named "system state". There are different states, which get updated at varying time intervals of about several minutes.

I want the time spent in each state.

Example:

System state  time (of the day)
high          23:37:06
low           23:34:05
normal        22:58:05
normal        22:37:05
normal        22:31:06
low              ...
low
high 
high 
high 

Do I necessarily have to calculate it manually? I tried df['time'] = df['time'].dt.strftime(%S), but it didn't work.

jared
  • 4,165
  • 1
  • 8
  • 31
Ines Hdn
  • 11
  • 1
  • Using only `%S` will only get you seconds for that minute, to calculate use something like [this](https://stackoverflow.com/a/7852969) and mod with 86400, to get that day's seconds. – Zero Jul 05 '23 at 14:58
  • `dt.strftime(%S)` is a syntax error, because `%S` is not in quotes... – John Gordon Jul 05 '23 at 15:00

2 Answers2

1

If you convert your state column to a datetime, then you can subtract the start of today and get total seconds from the timedelta.

df['state'] = pd.to_datetime(df['state'])
# Get total seconds
(df.state - pd.Timestamp.now().normalize()).dt.total_seconds()

Output

0    85026.0
1    84845.0
2    82685.0
3    81425.0
4    81066.0
Chris
  • 15,819
  • 3
  • 24
  • 37
0
import numpy as np
import pandas as pd
import datetime as dt

1) Replicating a part of your dataframe

states = ['high','low','normal','normal','normal']
times = ['23:37:06','23:34:05','22:58:05','22:37:05','22:31:06']
df = pd.DataFrame(columns = ['state','time'], data = list(zip(states, times)))
df
    state      time
0    high  23:37:06
1     low  23:34:05
2  normal  22:58:05
3  normal  22:37:05
4  normal  22:31:06

2) Convert string to actual time

datetime lets you convert such strings as "23:37:06" into usable time objects for pandas

df['T'] = pd.to_datetime(df['time'])
df
    state      time                   T
0    high  23:37:06 2023-07-05 23:37:06
1     low  23:34:05 2023-07-05 23:34:05
2  normal  22:58:05 2023-07-05 22:58:05
3  normal  22:37:05 2023-07-05 22:37:05
4  normal  22:31:06 2023-07-05 22:31:06

df.dtypes
state            object
time             object
T        datetime64[ns]
dtype: object

3) Calculate the time interval from a "state update" to the next

.shift(1) lets you work on a shifted copy of the time column so calculating a difference is straightforward.

Note that the latest state duration is unknown with the current given, and so it appears as "Not a Time" (NaT).

df['NextT'] = df['T'].shift(1)
df['DeltaT'] = df['NextT'] - df['T']
df
    state      time                   T               NextT          DeltaT
0    high  23:37:06 2023-07-05 23:37:06                 NaT             NaT
1     low  23:34:05 2023-07-05 23:34:05 2023-07-05 23:37:06 0 days 00:03:01
2  normal  22:58:05 2023-07-05 22:58:05 2023-07-05 23:34:05 0 days 00:36:00
3  normal  22:37:05 2023-07-05 22:37:05 2023-07-05 22:58:05 0 days 00:21:00
4  normal  22:31:06 2023-07-05 22:31:06 2023-07-05 22:37:05 0 days 00:05:59

4) Display your time interval column in seconds:

df['Delta seconds'] = df['DeltaT'].dt.seconds
df[['state','Delta seconds']]
    state  Delta seconds
0    high            NaN
1     low          181.0
2  normal         2160.0
3  normal         1260.0
4  normal          359.0

5) Finally add up time intervals from identical states

pd.pivot_table(data = df,
               values = ['Delta seconds'],
               index = ['state'],
               aggfunc = np.sum)

        Delta seconds
state                
high              0.0
low             181.0
normal         3779.0

Annex: strftime

Provided you had already converted 'time' column to datetime, dt.strftime extracts the number of e.g. seconds, not calculate total seconds.

df['T'].dt.strftime('%S')
0    06
1    05
2    05
3    05
4    06
Name: T, dtype: object
OCa
  • 298
  • 2
  • 13