0

I processing time-series data within a pandas dataframe. The datetime index is incomplete (i.e. some dates are missing).

I want to create a new column with a datetime series of 1 year offset, but only containg dates present in the original datetimeindex . The challenge is that the exact 1y match is not present in the index in many cases.

Index (Input)   1 year offset (Output)
1/2/2014        None
1/3/2014        None
1/6/2014        None
1/7/2014        None
1/9/2014        None
1/10/2014       None
1/2/2015        1/2/2014
1/5/2015        1/3/2014
1/6/2015        1/6/2014
1/7/2015        1/7/2014
1/8/2015        1/9/2014
1/9/2015        1/10/2014

The requirements are as follows:

  1. Every date as of 1/2/2015 must have a corresponding offset date (no blanks)
  2. Every date within the "offset date" group must also be present in the Index column (i.e. introduction of new dates, like 1/8/2014, is not desired

  3. All offset dates must be ordered in an ascending way (the sequence of dates must be preserved)

What I have tried so far:

  1. The Dateoffset doesn't help, since it is insensitive to dates not present in the index.
  2. The .shift method data["1 year offset (Output)"] = data.Index.shift(365) doesn't help because the number of dates within the index is different across the years.

What I am trying to do now has several steps:

  1. Apply Dateoffset method at first to create "temp 1 year offset"
  2. Remove single dates from "temp 1 year offset" that are not present in datetimeindex using set(list) method and replace cells by NaN

  3. Select dates in datetimeindex whose "temp 1 year offset" is NaN and substract one year

  4. Map the Dates from (3) to its closest date in the datetimeindex using argmin

The challenge here is that I am getting double entries as well as a descending order of days in some cases. Those mess up with the results in the following way (see the timedeltas between day n and day n+1):

Index (Input)   1 year offset (Output) Timedelta
4/17/2014       4/16/2014              1
4/22/2014       4/17/2014              1
4/23/2014       4/25/2014              8
4/24/2014       None
4/25/2014       4/22/2014             -3
4/28/2014       4/23/2014              1
4/29/2014       4/24/2014              1
4/30/2014       4/25/2014              1

In any case, this last approach seems to be an overkill concerning the simplicity of the underlying goal. Is there a faster and more simple way to do it?

How to group every date in an uneven pandas datetime series with the closest date one year ago in the same series?

sudonym
  • 3,788
  • 4
  • 36
  • 61
  • many thanks for your answer - list comprehension is new to me - could you be a bit less minimalist? – sudonym Nov 21 '17 at 07:21

1 Answers1

1

This would be a way:

However look at this thread to properly handle 1 year when the year has 366 days: Add one year in current date PYTHON

This code therefore needs some small modifications.

import pandas as pd
import datetime

df = pd.DataFrame(dict(dates=[
 '1/3/2014',
 '1/6/2014',
 '1/7/2014',
 '1/9/2014',
 '1/10/2014',
 '1/2/2015',
 '1/5/2015',
 '1/6/2015',
 '1/7/2015',
 '1/8/2015',
 '1/9/2015']))

# Convert column to datetime
df.dates = pd.to_datetime(df.dates)

# Store min(year) as a variable
minyear = min(df.dates).year

# Calculate the day with timedelta -365 days (might fail on 2012?)
df['offset'] = [(i + datetime.timedelta(days=-365)).date()
                if i.year != minyear else None for i in df.dates]

df

Returns:

        dates      offset
0  2014-01-03        None
1  2014-01-06        None
2  2014-01-07        None
3  2014-01-09        None
4  2014-01-10        None
5  2015-01-02  2014-01-02
6  2015-01-05  2014-01-05
7  2015-01-06  2014-01-06
8  2015-01-07  2014-01-07
9  2015-01-08  2014-01-08
10 2015-01-09  2014-01-09
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
  • +1 for elucidating your approach. This doesnt solve the issue. In your above example, the "offset"-column2 output contains 6 dates. But only 3 of those dates are also present in column1. I need something which only chooses an offset date for column2 among the complete group of dates present in column 1. Sorry for my lack of precision, I adjust the question accordingly. – sudonym Nov 21 '17 at 07:48
  • @sudonym yep that makes things more complicated. Unfortunately I need to work on smh else for the time being and can’t help you further atm. – Anton vBR Nov 21 '17 at 07:53
  • Reading your question carefully again my answer makes no sense :/ – Anton vBR Nov 21 '17 at 08:02