5

I have a pandas.DatetimeIndex for an interval ['2018-01-01', '2018-01-04') (start included, end excluded) and freq=1D:

>>> index = pd.DatetimeIndex(start='2018-01-01',
                             end='2018-01-04',
                             freq='1D',
                             closed='left')
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03'],
              dtype='datetime64[ns]',
              freq='D')

How can I obtain the correct open end='2018-01-04' attribute again? I need it for a DB query with timestamp ranges.

  1. There is no index.end
  2. index[-1] returns '2018-01-03'
  3. index[-1] + index.freq works in this case but is wrong for freq='2D'
eumiro
  • 207,213
  • 34
  • 299
  • 261
  • Why do not use `max(index)` ? – Anna Iliukovich-Strakovskaia Oct 09 '18 at 13:24
  • because it would return `index[-1]` and not the end parameter. – Mohit Motwani Oct 09 '18 at 13:26
  • @AnnaIliukovich-Strakovskaia `max(index)` returns `'2018-01-03'` just like `index[-1]`. I want `'2018-01-04'` because this is what was my `end` in the constructor. – eumiro Oct 09 '18 at 13:26
  • Why not just save it when you create the index? – user3483203 Oct 09 '18 at 13:34
  • @user3483203 I could create a dummy object with `start/end/freq` attributes and then pass it around and create the `DatetimeIndex` on the fly only when I need it. Not so practical if this `DatetimeIndex` is already attached to an existing `DataFrame`. I'd like to avoid keeping too many variables around. – eumiro Oct 09 '18 at 13:40
  • I think you're going to have to store it yourself: I can't see it stored inside the index, and since different `end` values can generate the same index values, you can't recover it after the fact in general. – DSM Oct 11 '18 at 19:43
  • why `index[-1] + index.freq` wrong? – quest Oct 16 '18 at 18:43
  • @PankajJoshi because for `freq=2D` the `index[-1]` would be `2018-01-04` and adding `index.freq` would return `2018-01-06` which is not the original `end`. – eumiro Oct 16 '18 at 18:45
  • @eumiro ah, right. Seems info is simply lost – quest Oct 16 '18 at 18:47

2 Answers2

5

There's no way because this information is lost after constructing the object. At creation time, the interval is unfolded into the resulting sequence:

pandas/core/indexes/datetimes.py:

class DatetimeIndex(<...>):

    <...>

    @classmethod
    def _generate(cls, start, end, periods, name, freq,
                  tz=None, normalize=False, ambiguous='raise', closed=None):
        <...>

                index = tools.to_datetime(np.linspace(start.value,
                                                      end.value, periods),
                                          utc=True)
                <...>

        if not left_closed and len(index) and index[0] == start:
            index = index[1:]
        if not right_closed and len(index) and index[-1] == end:
            index = index[:-1]
        index = cls._simple_new(index, name=name, freq=freq, tz=tz)
        return index

Neither is closed information saved anywhere, so you can't even infer it from the first/last point and step.


You can subclass DatetimeIndex and save this information. Note that it's an immutable type, so you need to override __new__ instead of __init__:

import inspect, collections
class SiDatetimeIndex(pd.DatetimeIndex):

    _Interval = collections.namedtuple('Interval',
            ('start','end','freq','closed'))
    #add 'interval' to dir(): DatetimeIndex inherits pandas.core.accessor.DirNamesMixin
    _accessors = pd.DatetimeIndex._accessors | frozenset(('interval',))

    def __new__(cls, *args, **kwargs):
        base_new = super(SiDatetimeIndex,cls).__new__
        callargs = inspect.getcallargs(base_new,cls,*args,**kwargs)
        result = base_new(**callargs)
        result.interval = cls._Interval._make(callargs[a] for a in cls._Interval._fields)
        return result


In [31]: index = SiDatetimeIndex(start='2018-01-01',
...:                              end='2018-01-04',
...:                              freq='1D',
...:                              closed='left')

In [38]: index.interval
Out[38]: Interval(start='2018-01-01', end='2018-01-04', freq='1D', closed='left')

Don't expect though that all the pandas methods (including the inherited ones in your class) will now magically start creating your overridden class. For that, you'll need to replace live references to the base class in loaded pandas modules that those methods use. Alternatively, you can replace just the original's __new__ -- then no need to replace references.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
0

Can something like this work for you?

index = pd.DatetimeIndex(start='2018-01-01', end='2018-01-04',  freq='1D', closed='left')

def get_end(index, freq):
    if freq == '1D':
        return(index.max()+1)

get_end(index, '1D')

You can write a logic for 1D/2D/1M. Also, make the column name of the dateIndex with Freq parameter as suffix/prefix 'purchase_date_1D' and parse it if you don't even want to give it as separate input.

wololo
  • 345
  • 2
  • 12