7

In a similar vein to this question, I have a numpy.timedelta64 column in a pandas DataFrame. As per this answer to the aforementioned question, there is a function pandas.tslib.repr_timedelta64 which nicely displays a timedelta in days, hours:minutes:seconds. I would like to format them only in days and hours.

So what I've got is the following:

def silly_format(hours):
    (days, hours) = divmod(hours, 24)
    if days > 0 and hours > 0:
        str_time = "{0:.0f} d, {1:.0f} h".format(days, hours)
    elif days > 0:
        str_time = "{0:.0f} d".format(days)
    else:
        str_time = "{0:.0f} h".format(hours)
    return str_time

df["time"].astype("timedelta64[h]").map(silly_format)

which gets me the desired output but I was wondering whether there is a function in numpy or pandas similar to datetime.strftime that can format numpy.timedelta64 according to some format string provided?


I tried to adapt @Jeff's solution further but it is way slower than my answer. Here it is:

days = time_delta.astype("timedelta64[D]").astype(int)
hours = time_delta.astype("timedelta64[h]").astype(int) % 24
result = days.astype(str)
mask = (days > 0) & (hours > 0)
result[mask] = days.astype(str) + ' d, ' + hours.astype(str) + ' h'
result[(hours > 0) & ~mask] = hours.astype(str) + ' h'
result[(days > 0) & ~mask] = days.astype(str) + ' d'
Community
  • 1
  • 1
Midnighter
  • 3,771
  • 2
  • 29
  • 43

4 Answers4

5

While the answers provided by @sebix and @Jeff show a nice way of converting the timedeltas to days and hours, and @Jeff's solution in particular retains the Series' index, they lacked in flexibility of the final formatting of the string. The solution I'm using now is:

def delta_format(days, hours):
    if days > 0 and hours > 0:
        return "{0:.0f} d, {1:.0f} h".format(days, hours)
    elif days > 0:
        return "{0:.0f} d".format(days)
    else:
        return "{0:.0f} h".format(hours)

days = time_delta.astype("timedelta64[D]")
hours = time_delta.astype("timedelta64[h]") % 24
return [delta_format(d, h) for (d, h) in izip(days, hours)]

which suits me well and I get back the index by inserting that list into the original DataFrame.

Midnighter
  • 3,771
  • 2
  • 29
  • 43
1

Here's how to do it in a vectorized manner.

In [28]: s = pd.to_timedelta(range(5),unit='d') + pd.offsets.Hour(3)

In [29]: s
Out[29]: 
0   0 days, 03:00:00
1   1 days, 03:00:00
2   2 days, 03:00:00
3   3 days, 03:00:00
4   4 days, 03:00:00
dtype: timedelta64[ns]

In [30]: days = s.astype('timedelta64[D]').astype(int)

In [31]: hours = s.astype('timedelta64[h]').astype(int)-days*24

In [32]: days
Out[32]: 
0    0
1    1
2    2
3    3
4    4
dtype: int64

In [33]: hours
Out[33]: 
0    3
1    3
2    3
3    3
4    3
dtype: int64

In [34]: days.astype(str) + ' d, ' + hours.astype(str) + ' h'
Out[34]: 
0    0 d, 3 h
1    1 d, 3 h
2    2 d, 3 h
3    3 d, 3 h
4    4 d, 3 h
dtype: object

If you want exactly as the OP posed:

In [4]: result = days.astype(str) + ' d, ' + hours.astype(str) + ' h'

In [5]: result[days==0] = hours.astype(str) + ' h'

In [6]: result
Out[6]: 
0         3 h
1    1 d, 3 h
2    2 d, 3 h
3    3 d, 3 h
4    4 d, 3 h
dtype: object
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • I appreciate that your solution leaves the `Series` with its index intact. I did not come up with a good way to either display `%d d, %d h` or `%d d` or `%d h` without very convoluted code. So I rather stuck to a list that I insert into the existing `DataFrame` and thus get back the index. – Midnighter Aug 15 '14 at 13:25
  • I updated. As I said, once you have the series you can do pretty much want you want. – Jeff Aug 15 '14 at 13:35
  • I updated my question with your suggestion but it's much more of a hassle than my answer below. – Midnighter Aug 15 '14 at 14:23
  • whatever works. doublt this would actually be slower unless you have a small sized frame, in which case it doesn't matter. – Jeff Aug 15 '14 at 14:27
  • Depends on the definition of small, even on 50k rows, the solution in the edited part of my question is 4 times slower. – Midnighter Aug 15 '14 at 14:44
1

@Midnighter's answer didn't work for me in Python 3, so here's my updated function:

def delta_format(delta: np.timedelta64) -> str:
    days = delta.astype("timedelta64[D]") / np.timedelta64(1, 'D')
    hours = int(delta.astype("timedelta64[h]") / np.timedelta64(1, 'h') % 24)

    if days > 0 and hours > 0:
        return f"{days:.0f} d, {hours:.0f} h"
    elif days > 0:
        return f"{days:.0f} d"
    else:
        return f"{hours:.0f} h"

Basically the same, but with f-strings, and more type coercion.

Seanny123
  • 8,776
  • 13
  • 68
  • 124
0

I don't know how it is done in pandas, but here's my numpy-only approach to your problem:

import numpy as np
t = np.array([200487900000000,180787000000000,400287000000000,188487000000000], dtype='timedelta64[ns]')

days = t.astype('timedelta64[D]').astype(np.int32) # gives: array([2, 2, 4, 2], dtype=int32)
hours = t.astype('timedelta64[h]').astype(np.int32)%24 # gives: array([ 7,  2, 15,  4], dtype=int32)

So I just convert the raw data to the desired output type (let it numpy do), then we have two arrays with the data and are free to use as we want. To group them pairwise, just do:

>>> np.array([days, hours]).T
array([[ 2,  7],
       [ 2,  2],
       [ 4, 15],
       [ 2,  4]], dtype=int32)

For example:

for row in d:
    print('%dd %dh' % tuple(row))

gives:

2d 7h
2d 2h
4d 15h
2d 4h
sebix
  • 2,943
  • 2
  • 28
  • 43
  • Your solution and the use of modulo is nice but it does not retain the full formatting options of the function above. – Midnighter Aug 15 '14 at 13:27