Format Aggregate timedelta in pandas pivot table

Question

Ok so I have looked at the pandas pivot table documentation and the following similar questions:

However my issue is the outputted timedeltas when aggregated they show the format 1 day 12:30:00 when I want it to display 36:30:00. Is there a way to do this?

CODE

pd.pivot_table(
  df,
  index=["date","task_type"],
  columns=["calendar"],
  values="duration",
  aggfunc=np.sum,
  fill_value=timedelta(hours=0),
  margins=True,
  margins_name="TOTAL"
)

OUTPUT

calendar                  Proj1    Proj2           Proj3           TOTAL
date       task_type
2017-05-04             01:00:00 00:00:00 0 days 00:00:00 0 days 01:00:00
           development 00:00:00 00:00:00 0 days 02:00:00 0 days 02:00:00
           training    00:00:00 00:00:00 0 days 04:00:00 0 days 04:00:00
2017-05-05 admin       00:00:00 02:00:00 0 days 00:00:00 0 days 02:00:00
           development 00:30:00 00:00:00 0 days 03:30:00 0 days 04:00:00
           meeting     00:00:00 00:00:00 0 days 01:00:00 0 days 01:00:00
2017-05-08             01:30:00 00:00:00 0 days 00:00:00 0 days 01:30:00
           admin       00:00:00 02:00:00 0 days 00:00:00 0 days 02:00:00
           development 00:00:00 00:00:00 0 days 05:00:00 0 days 05:00:00
2017-05-09 admin       02:00:00 01:00:00 0 days 00:00:00 0 days 03:00:00
           development 00:00:00 00:00:00 0 days 01:00:00 0 days 01:00:00
           research    01:00:00 00:00:00 0 days 00:00:00 0 days 01:00:00
           training    00:00:00 00:00:00 0 days 03:30:00 0 days 03:30:00
2017-05-10 admin       00:00:00 01:30:00 0 days 00:00:00 0 days 01:30:00
           development 00:00:00 00:00:00 0 days 02:30:00 0 days 02:30:00
           meeting     02:00:00 00:00:00 0 days 00:00:00 0 days 02:00:00
           training    00:00:00 00:00:00 0 days 02:00:00 0 days 02:00:00
2017-05-11 admin       00:00:00 01:00:00 0 days 00:00:00 0 days 01:00:00
           development 00:00:00 02:30:00 0 days 00:00:00 0 days 02:30:00
TOTAL                  08:00:00 10:00:00 1 days 00:30:00 1 days 18:30:00

EXPECTED

calendar                  Proj1    Proj2           Proj3           TOTAL
date       task_type
2017-05-04             01:00:00 00:00:00        00:00:00        01:00:00
           development 00:00:00 00:00:00        02:00:00        02:00:00
           training    00:00:00 00:00:00        04:00:00        04:00:00
2017-05-05 admin       00:00:00 02:00:00        00:00:00        02:00:00
           development 00:30:00 00:00:00        03:30:00        04:00:00
           meeting     00:00:00 00:00:00        01:00:00        01:00:00
2017-05-08             01:30:00 00:00:00        00:00:00        01:30:00
           admin       00:00:00 02:00:00        00:00:00        02:00:00
           development 00:00:00 00:00:00        05:00:00        05:00:00
2017-05-09 admin       02:00:00 01:00:00        00:00:00        03:00:00
           development 00:00:00 00:00:00        01:00:00        01:00:00
           research    01:00:00 00:00:00        00:00:00        01:00:00
           training    00:00:00 00:00:00        03:30:00        03:30:00
2017-05-10 admin       00:00:00 01:30:00        00:00:00        01:30:00
           development 00:00:00 00:00:00        02:30:00        02:30:00
           meeting     02:00:00 00:00:00        00:00:00        02:00:00
           training    00:00:00 00:00:00        02:00:00        02:00:00
2017-05-11 admin       00:00:00 01:00:00        00:00:00        01:00:00
           development 00:00:00 02:30:00        00:00:00        02:30:00
TOTAL                  08:00:00 10:00:00        24:30:00        42:30:00

score 1 · Accepted Answer · edited May 23 '17 at 12:10

Not so easy, but I try implemented this solution with apply:

def f(x):
    ts = x.dt.total_seconds()
    hours, remainder = divmod(ts, 3600)
    minutes, seconds = divmod(remainder, 60)
    return hours.abs().astype(int).astype(str).str.zfill(2) + ':' + \
           minutes.abs().astype(int).astype(str).str.zfill(2)


df1 = df.apply(f)
print (df1)
calendar                       Proj1  Proj2  Proj3   TOTAL
date                task_type                             
2017-04-03 15:30:00 c          00:00  00:00  00:00   00:00
                    e          10:30  10:30  00:00   21:00
2017-04-03 19:00:00 d          24:30  00:00  10:30   35:00
2017-04-04 05:30:00 a          00:00  03:30  00:00   03:30
2017-04-04 19:30:00 a          00:00  21:00  00:00   21:00
2017-04-04 23:00:00 a          00:00  00:00  03:30   03:30
                    c          00:00  00:00  17:30   17:30
                    d          00:00  17:30  00:00   17:30
TOTAL                          35:00  52:30  31:30  119:00

... and with applymap:

def f(x):
    ts = x.total_seconds()
    hours, remainder = divmod(ts, 3600)
    minutes, seconds = divmod(remainder, 60)
    return ('{:02d}:{:02d}').format(int(hours), int(minutes)) 

df2 = df.applymap(f)
print (df2)
calendar                       Proj1  Proj2  Proj3   TOTAL
date                task_type                             
2017-04-03 15:30:00 c          00:00  00:00  00:00   00:00
                    e          10:30  10:30  00:00   21:00
2017-04-03 19:00:00 d          24:30  00:00  10:30   35:00
2017-04-04 05:30:00 a          00:00  03:30  00:00   03:30
2017-04-04 19:30:00 a          00:00  21:00  00:00   21:00
2017-04-04 23:00:00 a          00:00  00:00  03:30   03:30
                    c          00:00  00:00  17:30   17:30
                    d          00:00  17:30  00:00   17:30
TOTAL                          35:00  52:30  31:30  119:00

Sample setup:

N = 10
np.random.seed(10)
rng = pd.date_range('2017-04-03 15:30:00', periods=N, freq='3.5H')
df = pd.DataFrame({'date': np.random.choice(rng, size=N), 
                   'task_type': np.random.choice(list('abcde'), size=N), 
                   'calendar': np.random.choice(['Proj1','Proj2','Proj3'], size=N),
                   'duration': np.abs(np.random.choice(rng, size=N) - 
                                      np.random.choice(rng, size=N)) })  
#print (df)



df = pd.pivot_table(
  df,
  index=["date","task_type"],
  columns=["calendar"],
  values="duration",
  aggfunc=np.sum,
  fill_value=pd.Timedelta(hours=0),
  margins=True,
  margins_name="TOTAL"
)

print (df)
calendar                                Proj1           Proj2           Proj3  \
date                task_type                                                   
2017-04-03 15:30:00 c         0 days 00:00:00 0 days 00:00:00 0 days 00:00:00   
                    e         0 days 10:30:00 0 days 10:30:00 0 days 00:00:00   
2017-04-03 19:00:00 d         1 days 00:30:00 0 days 00:00:00 0 days 10:30:00   
2017-04-04 05:30:00 a         0 days 00:00:00 0 days 03:30:00 0 days 00:00:00   
2017-04-04 19:30:00 a         0 days 00:00:00 0 days 21:00:00 0 days 00:00:00   
2017-04-04 23:00:00 a         0 days 00:00:00 0 days 00:00:00 0 days 03:30:00   
                    c         0 days 00:00:00 0 days 00:00:00 0 days 17:30:00   
                    d         0 days 00:00:00 0 days 17:30:00 0 days 00:00:00   
TOTAL                         1 days 11:00:00 2 days 04:30:00 1 days 07:30:00   

calendar                                TOTAL  
date                task_type                  
2017-04-03 15:30:00 c         0 days 00:00:00  
                    e         0 days 21:00:00  
2017-04-03 19:00:00 d         1 days 11:00:00  
2017-04-04 05:30:00 a         0 days 03:30:00  
2017-04-04 19:30:00 a         0 days 21:00:00  
2017-04-04 23:00:00 a         0 days 03:30:00  
                    c         0 days 17:30:00  
                    d         0 days 17:30:00  
TOTAL                         4 days 23:00:00

I went with the `applymap` option. – Josh Peak May 11 '17 at 20:35 — Josh Peak, May 11 '17 at 20:35

Format Aggregate timedelta in pandas pivot table

1 Answers1