Format timedelta64 string output

Question

In a similar vein to this question, I have a numpy.timedelta64 column in a pandas DataFrame. As per this answer to the aforementioned question, there is a function pandas.tslib.repr_timedelta64 which nicely displays a timedelta in days, hours:minutes:seconds. I would like to format them only in days and hours.

So what I've got is the following:

def silly_format(hours):
    (days, hours) = divmod(hours, 24)
    if days > 0 and hours > 0:
        str_time = "{0:.0f} d, {1:.0f} h".format(days, hours)
    elif days > 0:
        str_time = "{0:.0f} d".format(days)
    else:
        str_time = "{0:.0f} h".format(hours)
    return str_time

df["time"].astype("timedelta64[h]").map(silly_format)

which gets me the desired output but I was wondering whether there is a function in numpy or pandas similar to datetime.strftime that can format numpy.timedelta64 according to some format string provided?

I tried to adapt @Jeff's solution further but it is way slower than my answer. Here it is:

days = time_delta.astype("timedelta64[D]").astype(int)
hours = time_delta.astype("timedelta64[h]").astype(int) % 24
result = days.astype(str)
mask = (days > 0) & (hours > 0)
result[mask] = days.astype(str) + ' d, ' + hours.astype(str) + ' h'
result[(hours > 0) & ~mask] = hours.astype(str) + ' h'
result[(days > 0) & ~mask] = days.astype(str) + ' d'

Is it an option to use `pandas.tslib.repr_timedelta64` but then cut off the minute/second part? — joris, Aug 15 '14 at 11:44
Yes, for example. Or remove the last 6 characters (but that seems less robust if there could be microseconds) — joris, Aug 15 '14 at 11:57
It works 0K. The hours are zero padded in that case and it's actually slightly slower than the above method. — Midnighter, Aug 15 '14 at 12:03
If it is slower, you should just use your own solution, which looks good! — joris, Aug 15 '14 at 12:11

score 5 · Accepted Answer · answered Aug 15 '14 at 13:35

While the answers provided by @sebix and @Jeff show a nice way of converting the timedeltas to days and hours, and @Jeff's solution in particular retains the Series' index, they lacked in flexibility of the final formatting of the string. The solution I'm using now is:

def delta_format(days, hours):
    if days > 0 and hours > 0:
        return "{0:.0f} d, {1:.0f} h".format(days, hours)
    elif days > 0:
        return "{0:.0f} d".format(days)
    else:
        return "{0:.0f} h".format(hours)

days = time_delta.astype("timedelta64[D]")
hours = time_delta.astype("timedelta64[h]") % 24
return [delta_format(d, h) for (d, h) in izip(days, hours)]

which suits me well and I get back the index by inserting that list into the original DataFrame.

Jeff · Answer 2 · 2014-08-15T13:34:55.453

1

Here's how to do it in a vectorized manner.

In [28]: s = pd.to_timedelta(range(5),unit='d') + pd.offsets.Hour(3)

In [29]: s
Out[29]: 
0   0 days, 03:00:00
1   1 days, 03:00:00
2   2 days, 03:00:00
3   3 days, 03:00:00
4   4 days, 03:00:00
dtype: timedelta64[ns]

In [30]: days = s.astype('timedelta64[D]').astype(int)

In [31]: hours = s.astype('timedelta64[h]').astype(int)-days*24

In [32]: days
Out[32]: 
0    0
1    1
2    2
3    3
4    4
dtype: int64

In [33]: hours
Out[33]: 
0    3
1    3
2    3
3    3
4    3
dtype: int64

In [34]: days.astype(str) + ' d, ' + hours.astype(str) + ' h'
Out[34]: 
0    0 d, 3 h
1    1 d, 3 h
2    2 d, 3 h
3    3 d, 3 h
4    4 d, 3 h
dtype: object

If you want exactly as the OP posed:

In [4]: result = days.astype(str) + ' d, ' + hours.astype(str) + ' h'

In [5]: result[days==0] = hours.astype(str) + ' h'

In [6]: result
Out[6]: 
0         3 h
1    1 d, 3 h
2    2 d, 3 h
3    3 d, 3 h
4    4 d, 3 h
dtype: object

edited Aug 15 '14 at 13:34

answered Aug 15 '14 at 12:39

Jeff

125,376
21
220
187

I appreciate that your solution leaves the `Series` with its index intact. I did not come up with a good way to either display `%d d, %d h` or `%d d` or `%d h` without very convoluted code. So I rather stuck to a list that I insert into the existing `DataFrame` and thus get back the index. – Midnighter Aug 15 '14 at 13:25
I updated. As I said, once you have the series you can do pretty much want you want. – Jeff Aug 15 '14 at 13:35
I updated my question with your suggestion but it's much more of a hassle than my answer below. – Midnighter Aug 15 '14 at 14:23
whatever works. doublt this would actually be slower unless you have a small sized frame, in which case it doesn't matter. – Jeff Aug 15 '14 at 14:27
Depends on the definition of small, even on 50k rows, the solution in the edited part of my question is 4 times slower. – Midnighter Aug 15 '14 at 14:44

score 1 · Answer 3 · answered Oct 12 '18 at 21:38

@Midnighter's answer didn't work for me in Python 3, so here's my updated function:

def delta_format(delta: np.timedelta64) -> str:
    days = delta.astype("timedelta64[D]") / np.timedelta64(1, 'D')
    hours = int(delta.astype("timedelta64[h]") / np.timedelta64(1, 'h') % 24)

    if days > 0 and hours > 0:
        return f"{days:.0f} d, {hours:.0f} h"
    elif days > 0:
        return f"{days:.0f} d"
    else:
        return f"{hours:.0f} h"

Basically the same, but with f-strings, and more type coercion.

sebix · Answer 4 · 2014-08-15T13:14:31.700

I don't know how it is done in pandas, but here's my numpy-only approach to your problem:

import numpy as np
t = np.array([200487900000000,180787000000000,400287000000000,188487000000000], dtype='timedelta64[ns]')

days = t.astype('timedelta64[D]').astype(np.int32) # gives: array([2, 2, 4, 2], dtype=int32)
hours = t.astype('timedelta64[h]').astype(np.int32)%24 # gives: array([ 7,  2, 15,  4], dtype=int32)

So I just convert the raw data to the desired output type (let it numpy do), then we have two arrays with the data and are free to use as we want. To group them pairwise, just do:

>>> np.array([days, hours]).T
array([[ 2,  7],
       [ 2,  2],
       [ 4, 15],
       [ 2,  4]], dtype=int32)

For example:

for row in d:
    print('%dd %dh' % tuple(row))

gives:

2d 7h
2d 2h
4d 15h
2d 4h

Your solution and the use of modulo is nice but it does not retain the full formatting options of the function above. — Midnighter, Aug 15 '14 at 13:27

Format timedelta64 string output

4 Answers4