14

I have a pandas.DatetimeIndex, e.g.:

pd.date_range('2012-1-1 02:03:04.000',periods=3,freq='1ms')
>>> [2012-01-01 02:03:04, ..., 2012-01-01 02:03:04.002000]

I would like to round the dates (Timestamps) to the nearest second. How do I do that? The expected result is similar to:

[2012-01-01 02:03:04.000000, ..., 2012-01-01 02:03:04.000000]

Is it possible to accomplish this by rounding a Numpy datetime64[ns] to seconds without changing the dtype [ns]?

np.array(['2012-01-02 00:00:00.001'],dtype='datetime64[ns]')
Yariv
  • 12,945
  • 19
  • 54
  • 75

4 Answers4

16

Update: if you're doing this to a DatetimeIndex / datetime64 column a better way is to use np.round directly rather than via an apply/map:

np.round(dtindex_or_datetime_col.astype(np.int64), -9).astype('datetime64[ns]')

Old answer (with some more explanation):

Whilst @Matti's answer is clearly the correct way to deal with your situation, I thought I would add an answer how you might round a Timestamp to the nearest second:

from pandas.lib import Timestamp

t1 = Timestamp('2012-1-1 00:00:00')
t2 = Timestamp('2012-1-1 00:00:00.000333')

In [4]: t1
Out[4]: <Timestamp: 2012-01-01 00:00:00>

In [5]: t2
Out[5]: <Timestamp: 2012-01-01 00:00:00.000333>

In [6]: t2.microsecond
Out[6]: 333

In [7]: t1.value
Out[7]: 1325376000000000000L

In [8]: t2.value
Out[8]: 1325376000000333000L

# Alternatively: t2.value - t2.value % 1000000000
In [9]: long(round(t2.value, -9)) # round milli-, micro- and nano-seconds
Out[9]: 1325376000000000000L

In [10]: Timestamp(long(round(t2.value, -9)))
Out[10]: <Timestamp: 2012-01-01 00:00:00>

Hence you can apply this to the entire index:

def to_the_second(ts):
    return Timestamp(long(round(ts.value, -9)))

dtindex.map(to_the_second)
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
15

round() method was added for DatetimeIndex, Timestamp, TimedeltaIndex and Timedelta in pandas 0.18.0. Now we can do the following:

In[114]: index = pd.DatetimeIndex([pd.Timestamp('2012-01-01 02:03:04.000'), pd.Timestamp('2012-01-01 02:03:04.002'), pd.Timestamp('20130712 02:03:04.500'), pd.Timestamp('2012-01-01 02:03:04.501')])

In[115]: index.values
Out[115]: 
array(['2012-01-01T02:03:04.000000000', '2012-01-01T02:03:04.002000000',
       '2013-07-12T02:03:04.500000000', '2012-01-01T02:03:04.501000000'], dtype='datetime64[ns]')

In[116]: index.round('S')
Out[116]: 
DatetimeIndex(['2012-01-01 02:03:04', '2012-01-01 02:03:04',
               '2013-07-12 02:03:04', '2012-01-01 02:03:05'],
              dtype='datetime64[ns]', freq=None)

round() accepts frequency parameter. String aliases for it are listed here.

wombatonfire
  • 4,585
  • 28
  • 36
4

There is little point in changing the index itself - since you can just generate using date_range with the desired frequency parameter as in your question.

I assume what you are trying to do is change the frequency of a Time Series that contains data, in which case you can use resample (documentation). For example if you have the following time series:

dt_index = pd.date_range('2012-1-1 00:00.001',periods=3, freq='1ms')
ts = pd.Series(randn(3), index=dt_index)


2012-01-01 00:00:00           0.594618
2012-01-01 00:00:00.001000    0.874552
2012-01-01 00:00:00.002000   -0.700076
Freq: L

Then you can change the frequency to seconds using resample, specifying how you want to aggregate the values (mean, sum etc.):

ts.resample('S', how='sum')

2012-01-01 00:00:00    0.594618
2012-01-01 00:00:01    0.174475
Freq: S
Matti John
  • 19,329
  • 7
  • 41
  • 39
  • This eliminates some of the rows. I would like only to *change* the index value by rounding it to the nearest second. – Yariv Dec 09 '12 at 14:54
  • ah, sorry I hadn't realised you wanted duplicate values. It looks like @hayden's updated answer will do what you want – Matti John Dec 09 '12 at 15:15
3

For more general rounding, you can make use of the fact that Pandas Timestamp objects mostly use the standard library datetime.datetime API, including the datetime.datetime.replace() method.

So, to solve your microsecond rounding problem, you could do:

import datetime
import pandas as pd

times = pd.date_range('2012-1-1 02:03:04.499',periods=3,freq='1ms')
# Add 5e5 microseconds and truncate to simulate rounding
times_rounded = [(x + datetime.timedelta(microseconds=5e5)).replace(microsecond=0) for x in times]

from IPython.display import display
print('Before:')
display(list(times))
print('After:')
display(list(times_rounded))

Output:

Before:
[Timestamp('2012-01-01 02:03:04.499000', offset='L'),
 Timestamp('2012-01-01 02:03:04.500000', offset='L'),
 Timestamp('2012-01-01 02:03:04.501000', offset='L')]
After:
[Timestamp('2012-01-01 02:03:04', offset='L'),
 Timestamp('2012-01-01 02:03:05', offset='L'),
 Timestamp('2012-01-01 02:03:05', offset='L')]

You can use the same technique to, e.g., round to the nearest day (as long as you're not concerned about leap seconds and the like):

times = pd.date_range('2012-1-1 08:00:00', periods=3, freq='4H')
times_rounded = [(x + datetime.timedelta(hours=12)).replace(hour=0, second=0, microsecond=0) for x in times]

Inspired by this SO post: https://stackoverflow.com/a/19718411/1410871

Community
  • 1
  • 1
Daniel Golden
  • 3,752
  • 2
  • 27
  • 32