2

I would like to retain the '1min' resolution of data, and return a subset of the df where the date matches one of the three in the indexer

import pandas as pd
import numpy as np

df=pd.DataFrame(index=pd.date_range("2013-10-08 00:00:00","2015-10-08 00:00:00", freq="1min",tz='UTC'))
df['data']=np.random.random_integers(0,1,len(df))
indexer=["2013-12-24","2014-01-16","2015-02-19"]

The following doesn't work:

df.loc[pd.DatetimeIndex(indexer)]
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
themachinist
  • 1,413
  • 2
  • 17
  • 22

1 Answers1

0

You have to get integer location by function get_loc and then you can select data by df.ix.

But you need not only midnight, but also 1440 minutes in one day.

I used list comprehension again with function range with range midnight from 00:00:00+00:00 to 23:59:00+00:00. Last flat list was created, because range returned lists of list.

print df.head()
#                           data
#2013-10-08 00:00:00+00:00     0
#2013-10-08 00:01:00+00:00     0
#2013-10-08 00:02:00+00:00     1
#2013-10-08 00:03:00+00:00     0
#2013-10-08 00:04:00+00:00     0

#list comprehension - get loc of dates
idx = [df.index.get_loc(pd.to_datetime(i)) for i in indexer]
print idx
#[110880, 144000, 718560]

#add 1439 + 1 minutes, because range is 0 indexing
idx = [range(i, i+1440) for i in idx]
#flatten list
idx = [y for x in idx for y in x]

#select df by integer indexes
df = df.ix[idx]

print df.head()
#                           data
#2013-12-24 00:00:00+00:00     1
#2013-12-24 00:01:00+00:00     0
#2013-12-24 00:02:00+00:00     0
#2013-12-24 00:03:00+00:00     0
#2013-12-24 00:04:00+00:00     1
print df.tail()
#                           data
#2015-02-19 23:55:00+00:00     1
#2015-02-19 23:56:00+00:00     0
#2015-02-19 23:57:00+00:00     0
#2015-02-19 23:58:00+00:00     0
#2015-02-19 23:59:00+00:00     0
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252