Looking for a faster way to replace year in pandas DatetimeIndex

Question

I have a DataFrame with about 20 million rows and a DatetimeIndex. There are data from different years, and I would like to assign them all to the same year by changing the timestamps. The statements below accomplish this, but are a bit slower than I would like (double-digit seconds).

new_index = [ts.replace(year=2014) for ts in df.index]
df.index = new_index

The data are not evenly spaced, so I cannot generate a new index easily. Is there a better way?

Please include [`reproducible example`](https://stackoverflow.com/q/20109391/4985099) — sushanth, Aug 30 '20 at 12:23

score 4 · Accepted Answer · answered Aug 30 '20 at 14:59

4

Try with:

%%time
new_index = pd.to_datetime({
    'year': 2014,
    'month': df.index.month,
    'day': df.index.day})

CPU times: user 333 ms, sys: 34.4 ms, total: 367 ms
Wall time: 346 ms

Compared to the original:

%%time
new_index = [ts.replace(year=2014) for ts in df.index]

CPU times: user 6.97 s, sys: 115 ms, total: 7.08 s
Wall time: 7.1 s

The timings are for 1M dataset, but I would expect a similar improvement for 20M.

Also, of course, if hours/minutes/seconds are to be preserved, they should be added to to_datetime as well.

answered Aug 30 '20 at 14:59

perl

9,826
1
10
22

1

Upvote for including and radically improving exec timing, as the OP specifically requested. – Jason R Stevens CFA Aug 30 '20 at 15:05
1

Creation of `new_index` went from 80 s to 16 s, and the second statement assigning it to `df` went from 40 s to 0 s. That makes me happy. – adr Aug 30 '20 at 19:47
Great, happy that it helped! – perl Aug 30 '20 at 19:49

score 0 · Answer 2 · answered Aug 30 '20 at 12:31

Please try below:

import datetime as dt
df.index = df.index.to_series().apply(lambda x: dt.datetime.strftime(x, '2016-%m-%d %H:%M:%S')).tolist()

In above example, I am trying change year to 2016 and output is as below:

df

    open    high    low     close
2016-01-02 09:08:00     116.00  116.00  116.00  116.00
2016-01-02 09:16:00     116.10  117.80  117.00  113.00
2016-01-03 09:07:00     115.50  116.20  115.50  116.20
2016-01-02 09:19:00     116.00  116.00  115.60  115.75
2016-01-02 09:18:00     116.05  116.35  116.00  116.00

score 0 · Answer 3 · answered Aug 30 '20 at 13:22

0

You can try,

df.index = pd.DatetimeIndex(df.index)
df.index = df.index + pd.DateOffset(year=2016)

answered Aug 30 '20 at 13:22

Rajesh

766
5
17

Looking for a faster way to replace year in pandas DatetimeIndex

3 Answers3