0

I have a function written that is intended to truncate my data before or after a given range. I pass in a tuple of dates, position 1 being the start date and position 2 being the end date.

How could I also optionally specify a time as well date to truncate b4 and after - how would i ammend my code to do that?*)

Also - I keep getting an error on my data that says:

    raise InvalidIndexError('Reindexing only valid with uniquely'
pandas.core.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index
objects

I use:

date_range = self.data.copy()
nearest_start = date_range.iloc[
            date_range.index.get_loc(datetime.datetime.strptime(split[0], '%m-%d-%Y'),
                     method='nearest')].name
...
date_range = date_range.truncate(before=nearest_start, after=nearest_end)

My date tuple:

('12-29-2011','12-23-2017')

This is my df.head() for the dataset:

                               Open       H   ...                RBE1min        O
        DateTime                              ...                                
        2007-11-06 12:45:00 -0.6437 -0.6423   ...     11/6/2007 12:45.00  21198.0
        2007-11-06 14:30:00 -0.6430 -0.6425   ...     11/6/2007 14:30.00  21355.0
        2007-11-06 19:33:00 -0.6423 -0.6400   ...     11/6/2007 19:33.00  21430.0
        2007-11-07 09:00:00 -0.6434 -0.6421   ...     11/7/2007 09:00.00  21440.0
        2007-11-07 12:26:00 -0.6433 -0.6396   ...     11/7/2007 12:26.00  21392.0

I've run the following to check for dupilcates in my index and and it returns NAT so i'm not sure why I'm still gettting this error

Any help would be greatly appreciated

dup = rbs4181.index.get_duplicates()
novawaly
  • 1,051
  • 3
  • 12
  • 27
  • https://stackoverflow.com/questions/50250022/pandas-selecting-rows-in-a-specific-time-window/50250499#50250499 (I know the error is different, hence why I'm not flagging and answering but I think it'll help :) – IMCoins Nov 19 '18 at 18:46
  • @IMCoins I've gone though that thread and checked my index for duplicates but i'm still getting that error. Any other advice? – novawaly Nov 19 '18 at 19:04
  • I suggest you do that just to make sure you have no dupe ;) https://stackoverflow.com/q/35084071/8003790 – IMCoins Nov 19 '18 at 19:25

1 Answers1

0

Not exactly what you want I fear. But it might help:

I used this dataset http://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+# for testing.

For datetime indexes dates and strings that parse to timestamps can be passed as indexing parameters and as before and after parameters for truncate as well as pd.Timestamp.

In [1]: import pandas as pd

In [2]: df = pd.read_csv('datatest2.txt', parse_dates=[1], index_col=[1])

In [3]: df.index
Out[3]:
DatetimeIndex(['2015-02-11 14:48:00', '2015-02-11 14:49:00',
               '2015-02-11 14:50:00', '2015-02-11 14:51:00',
               '2015-02-11 14:51:59', '2015-02-11 14:53:00',
               '2015-02-11 14:54:00', '2015-02-11 14:55:00',
               '2015-02-11 14:55:59', '2015-02-11 14:57:00',
               ...
               '2015-02-18 09:10:00', '2015-02-18 09:10:59',
               '2015-02-18 09:11:59', '2015-02-18 09:13:00',
               '2015-02-18 09:14:00', '2015-02-18 09:15:00',
               '2015-02-18 09:16:00', '2015-02-18 09:16:59',
               '2015-02-18 09:17:59', '2015-02-18 09:19:00'],
              dtype='datetime64[ns]', name='date', length=9752, freq=None)

In [4]: df['2015-02-12':'2015-02-13'].index
Out[4]:
DatetimeIndex(['2015-02-12 00:00:00', '2015-02-12 00:01:00',
               '2015-02-12 00:02:00', '2015-02-12 00:03:00',
               '2015-02-12 00:04:00', '2015-02-12 00:04:59',
               '2015-02-12 00:06:00', '2015-02-12 00:07:00',
               '2015-02-12 00:08:00', '2015-02-12 00:08:59',
               ...
               '2015-02-13 23:50:00', '2015-02-13 23:51:00',
               '2015-02-13 23:51:59', '2015-02-13 23:53:00',
               '2015-02-13 23:54:00', '2015-02-13 23:55:00',
               '2015-02-13 23:55:59', '2015-02-13 23:57:00',
               '2015-02-13 23:57:59', '2015-02-13 23:58:59'],
              dtype='datetime64[ns]', name='date', length=2880, freq=None)

In [5]: df.truncate(before=pd.Timestamp('2015-02-12'), after=pd.Timestamp('2015-02-14')).index
Out[5]:
DatetimeIndex(['2015-02-12 00:00:00', '2015-02-12 00:01:00',
               '2015-02-12 00:02:00', '2015-02-12 00:03:00',
               '2015-02-12 00:04:00', '2015-02-12 00:04:59',
               '2015-02-12 00:06:00', '2015-02-12 00:07:00',
               '2015-02-12 00:08:00', '2015-02-12 00:08:59',
               ...
               '2015-02-13 23:51:00', '2015-02-13 23:51:59',
               '2015-02-13 23:53:00', '2015-02-13 23:54:00',
               '2015-02-13 23:55:00', '2015-02-13 23:55:59',
               '2015-02-13 23:57:00', '2015-02-13 23:57:59',
               '2015-02-13 23:58:59', '2015-02-14 00:00:00'],
              dtype='datetime64[ns]', name='date', length=2881, freq=None)

In [6]: df.truncate(before='2015-02-12', after='2015-02-14').index
Out[6]:
DatetimeIndex(['2015-02-12 00:00:00', '2015-02-12 00:01:00',
               '2015-02-12 00:02:00', '2015-02-12 00:03:00',
               '2015-02-12 00:04:00', '2015-02-12 00:04:59',
               '2015-02-12 00:06:00', '2015-02-12 00:07:00',
               '2015-02-12 00:08:00', '2015-02-12 00:08:59',
               ...
               '2015-02-13 23:51:00', '2015-02-13 23:51:59',
               '2015-02-13 23:53:00', '2015-02-13 23:54:00',
               '2015-02-13 23:55:00', '2015-02-13 23:55:59',
               '2015-02-13 23:57:00', '2015-02-13 23:57:59',
               '2015-02-13 23:58:59', '2015-02-14 00:00:00'],
              dtype='datetime64[ns]', name='date', length=2881, freq=None)

In [7]: df.truncate(before='2015-02-12 01:00:00', after='2015-02-13 23:00:00').index
Out[7]:
DatetimeIndex(['2015-02-12 01:00:00', '2015-02-12 01:01:00',
               '2015-02-12 01:01:59', '2015-02-12 01:02:59',
               '2015-02-12 01:04:00', '2015-02-12 01:05:00',
               '2015-02-12 01:06:00', '2015-02-12 01:07:00',
               '2015-02-12 01:08:00', '2015-02-12 01:08:59',
               ...
               '2015-02-13 22:51:00', '2015-02-13 22:52:00',
               '2015-02-13 22:53:00', '2015-02-13 22:53:59',
               '2015-02-13 22:54:59', '2015-02-13 22:56:00',
               '2015-02-13 22:57:00', '2015-02-13 22:58:00',
               '2015-02-13 22:59:00', '2015-02-13 22:59:59'],
              dtype='datetime64[ns]', name='date', length=2761, freq=None)

Hence I think you need to only modify your function to validate whether valid dates (and times) were passed.