18

I am trying to go through every row in a DataFrame index and remove all rows that are not between a certain time.

I have been looking for solutions but none of them separate the Date from the Time, and all I want to do is drop the rows that are outside of a Time range.

Jeff
  • 873
  • 2
  • 10
  • 16

3 Answers3

22

You can use the between_time function directly:

ts.between_time(datetime.time(18), datetime.time(9), include_start=False, include_end=False)

Original answer:

You can use the indexer_between_time Index method.

For example, to include those times between 9am and 6pm (inclusive):

ts.ix[ts.index.indexer_between_time(datetime.time(9), datetime.time(18))]

to do the opposite and exclude those times between 6pm and 9am (exclusive):

ts.ix[ts.index.indexer_between_time(datetime.time(18), datetime.time(9),
                                    include_start=False, include_end=False)]

Note: indexer_between_time's arguments include_start and include_end are by default True, setting include_start to False means that datetimes whose time-part is precisely start_time (the first argument), in this case 6pm, will not be included.

Example:

In [1]: rng = pd.date_range('1/1/2000', periods=24, freq='H')

In [2]: ts = pd.Series(pd.np.random.randn(len(rng)), index=rng)

In [3]: ts.ix[ts.index.indexer_between_time(datetime.time(10), datetime.time(14))] 
Out[3]: 
2000-01-01 10:00:00    1.312561
2000-01-01 11:00:00   -1.308502
2000-01-01 12:00:00   -0.515339
2000-01-01 13:00:00    1.536540
2000-01-01 14:00:00    0.108617

Note: the same syntax (using ix) works for a DataFrame:

In [4]: df = pd.DataFrame(ts)

In [5]: df.ix[df.index.indexer_between_time(datetime.time(10), datetime.time(14))]
Out[5]: 
                            0
2000-01-03 10:00:00  1.312561
2000-01-03 11:00:00 -1.308502
2000-01-03 12:00:00 -0.515339
2000-01-03 13:00:00  1.536540
2000-01-03 14:00:00  0.108617
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • For a dataframe I had to do something like this: df.reindex(df.index[df.index.indexer_between_time(time(9), time(18))]) – Jeff Jan 26 '13 at 19:32
  • @Jeff -- I think you have to use `DataFrame.ix` like `df.ix[df.index.indexer_between_time()` on the `DataFrame` – root Jan 26 '13 at 19:35
  • @AndyHayden -- My first thought was also that it worked the same, but I think DataFrame tries to apply it to the columns (without the ix). – root Jan 26 '13 at 19:40
  • @root I think it's good practice to use ix with Series as well, and that way the syntax is identical, so I updated the first part. – Andy Hayden Jan 26 '13 at 19:42
  • 1
    @AndyHayden -- to make it complete you could also add that the method also takes `include_start` and `include_end` arguments, not very important, but may become useful sometimes :) – root Jan 26 '13 at 19:51
  • File "/usr/lib64/python2.7/site-packages/pandas/tseries/index.py", line 1712, in indexer_between_time raise NotImplementedError NotImplementedError – quant_dev Mar 18 '16 at 14:53
  • As of newer Pandas versions, I believe this is precisely what is used in `Series.between_time()`. – Brad Solomon Jul 10 '18 at 04:04
  • (See https://github.com/pandas-dev/pandas/blob/da6e26db079cd4050c1693939db6a52afb2978e1/pandas/core/generic.py#L6959) – Brad Solomon Jul 10 '18 at 04:05
  • @BradSolomon Thanks! Good point, I _don't_ think that existed at the time, but I could be wrong :) – Andy Hayden Jul 10 '18 at 18:12
4

You can also do:

rng = pd.date_range('1/1/2000', periods=24, freq='H')
ts = pd.Series(pd.np.random.randn(len(rng)), index=rng)
ts.ix[datetime.time(10):datetime.time(14)]
Out[4]: 
2000-01-01 10:00:00   -0.363420
2000-01-01 11:00:00   -0.979251
2000-01-01 12:00:00   -0.896648
2000-01-01 13:00:00   -0.051159
2000-01-01 14:00:00   -0.449192
Freq: H, dtype: float64

DataFrame works same way.

Ivelin
  • 12,293
  • 5
  • 37
  • 35
0

If you are not looking for a inplace function, then between_time is the way to go as stated by @AndyHayden.

For those looking for an inplace version, I leave this comment here: If you would like to have the selection inplace (directly applied to the pandas.DataFrame df), you can use:

df.drop(numpy.setdiff1d(df.index, df.between_time(start_time=dt.time(hours=8,minute=0),
                                               end_time=dt.time(hours=18, minute=0),
                                               include_start=True, 
                                               include_end=False).index)
       ,inplace=True)

Where dtis datetime.

Here we find the indexe, which are not in the range of the time and drop those in place.

ernesi
  • 358
  • 1
  • 8