71

I have a pandas dataframe that has two datetime64 columns and one timedelta64 column that is the difference between the two columns. I'm trying to plot a histogram of the timedelta column to visualize the time differences between the two events.

However, just using df['time_delta'] results in: TypeError: ufunc add cannot use operands with types dtype('<m8[ns]') and dtype('float64')

Trying to convert the timedelta column to : float--> df2 = df1['time_delta'].astype(float) results in: TypeError: cannot astype a timedelta from [timedelta64[ns]] to [float64]

How would one create a histogram of pandas timedelta data?

PepperoniPizza
  • 8,842
  • 9
  • 58
  • 100
DataSwede
  • 5,251
  • 10
  • 40
  • 66
  • 6
    How did you finally plot it? I'm unable to simply plot a series with value datatype timedelta64. The error says 'no numeric data to plot'! – Parisa Rai Apr 14 '16 at 08:19

4 Answers4

59

Here are ways to convert timedeltas, docs are here

In [2]: pd.to_timedelta(np.arange(5),unit='d')+pd.to_timedelta(1,unit='s')
Out[2]: 
0   0 days, 00:00:01
1   1 days, 00:00:01
2   2 days, 00:00:01
3   3 days, 00:00:01
4   4 days, 00:00:01
dtype: timedelta64[ns]

Convert to seconds (is an exact conversion)

In [3]: (pd.to_timedelta(np.arange(5),unit='d')+pd.to_timedelta(1,unit='s')).astype('timedelta64[s]')
Out[3]: 
0         1
1     86401
2    172801
3    259201
4    345601
dtype: float64

Convert using astype will round to that unit

In [4]: (pd.to_timedelta(np.arange(5),unit='d')+pd.to_timedelta(1,unit='s')).astype('timedelta64[D]')
Out[4]: 
0    0
1    1
2    2
3    3
4    4
dtype: float64

Division will give an exact repr

In [5]: (pd.to_timedelta(np.arange(5),unit='d')+pd.to_timedelta(1,unit='s')) / np.timedelta64(1,'D')
Out[5]: 
0    0.000012
1    1.000012
2    2.000012
3    3.000012
4    4.000012
dtype: float64
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • Perfect. Thanks! I totally skipped over the to_timedelta section when looking for a from_timedelta... – DataSwede May 08 '14 at 14:26
  • 13
    umm... so these are some ways to convert timedeltas... but the question was asking for ways to plot timedelta information. Care to elaborate?? – drevicko Sep 14 '17 at 09:52
  • you can convert timedelts to float or string to plot; these are not supported currently ; though could be via a pull request from the community – Jeff Sep 14 '17 at 10:03
39

You can plot nice histograms using the numpy timedelta data types.

For example:

df['time_delta'].astype('timedelta64[s]').plot.hist()

will produce a histogram of the time deltas in seconds. To use minutes instead, you could do this:

(df['time_delta'].astype('timedelta64[s]') / 60).plot.hist()

or use [m] timedelta.

df['time_delta'].astype('timedelta64[m]').plot.hist()

Here's list of other time delta types (from the docs) you might want, depending on the resolution you need:

Code    Meaning Time span (relative)    Time span (absolute)
h   hour    +/- 1.0e15 years    [1.0e15 BC, 1.0e15 AD]
m   minute  +/- 1.7e13 years    [1.7e13 BC, 1.7e13 AD]
s   second  +/- 2.9e11 years    [2.9e11 BC, 2.9e11 AD]
ms  millisecond +/- 2.9e8 years [ 2.9e8 BC, 2.9e8 AD]
us  microsecond +/- 2.9e5 years [290301 BC, 294241 AD]
ns  nanosecond  +/- 292 years   [ 1678 AD, 2262 AD]
ps  picosecond  +/- 106 days    [ 1969 AD, 1970 AD]
fs  femtosecond +/- 2.6 hours   [ 1969 AD, 1970 AD]
as  attosecond  +/- 9.2 seconds [ 1969 AD, 1970 AD]
Alex
  • 12,078
  • 6
  • 64
  • 74
  • what is `s` and `m` in `timedelta64[s]` or `timedelta64[m]`? – matt b Mar 25 '19 at 16:00
  • They indicate that the accuracy of the timedelta `s` means seconds, and `m` is minutes in this case. Keep in mind that these will be cut of rather than rounded, when you convert to a higher unit. – Mac C. Mar 26 '19 at 20:38
  • I think @mattb was referring to the fact that the code doesn't compile. I believe it's supposed to be `df['time_delta'].astype("timedelta64[m]")`. – hhquark May 17 '19 at 17:33
  • .astype('timedelta64[m]') seems dropping the decimal after minute. i.e. 2.4 min becomes 2 min – Yuanyi Wu Jan 12 '21 at 06:10
14

How about

df['time_delta'].dt.days.hist()

...? (Where you can use seconds, microseconds, or nanoseconds instead of days depending on your needs / your data).

Stas
  • 305
  • 3
  • 8
4

Another method (that worked for me) is to simply divide by a Timedelta :

plt.hist(df['time_delta']/pd.Timedelta(minutes=1), bins=20)
edelans
  • 8,479
  • 4
  • 36
  • 45