12

Is there a way to compute and return in datetime format the median of a datetime column? I want to calculate the median of a column in python which is in datetime64[ns] format. Below is a sample to the column:

df['date'].head()

0   2017-05-08 13:25:13.342
1   2017-05-08 16:37:45.545
2   2017-01-12 11:08:04.021
3   2016-12-01 09:06:29.912
4   2016-06-08 03:16:40.422

Name: recency, dtype: datetime64[ns]

My aim is to have the median in same datetime format as the date column above:

Tried converting to np.array:

median_ = np.median(np.array(df['date']))

But that throws the error:

TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')

Converting to int64 and then calculating the median and attempt to the return format to datetime does not work

df['date'].astype('int64').median().astype('datetime64[ns]')
T-Jay
  • 347
  • 1
  • 4
  • 16

3 Answers3

13

You can also try quantile(0.5):

df['date'].astype('datetime64[ns]').quantile(0.5, interpolation="midpoint")
Bora M. Alper
  • 3,538
  • 1
  • 24
  • 35
user394430
  • 2,805
  • 2
  • 28
  • 27
6

How about just taking the middle value?

dates = list(df.sort('date')['date'])
print dates[len(dates)//2]

If the table is sorted you can even skip a line.

kabanus
  • 24,623
  • 6
  • 41
  • 74
  • Thanks @kabanus. This works well. It did not occur to me to sort and use the length of the column. – T-Jay May 11 '17 at 05:35
4

You are close, the median() return a float so convert it to be an int first:

import math

median = math.floor(df['date'].astype('int64').median())

Then convert the int represent the date into datetime64:

result = np.datetime64(median, "ns") #unit: nanosecond
SalaryNotFound
  • 224
  • 1
  • 8