2
dft = pd.DataFrame(randn(100000,1), columns=['A'],
     index=pd.date_range('20130101',periods=100000,freq='T'))

As you can see, I initialize a Dateframe from '2013-01-01' to ‘2013-03-11’ with 10 min interval. How can I select specific data from specific conditions?

1) Date in a list eg: If there is a list ['2013-01-02', '2013-01-04', '2013-01-06'] How can I select the data on this list date? Or how can I select the data that are not on this list date? More specifically, '2013-01-02' means all the data from '2013-01-02 00:00:00' to '2013-01-02 23:50:00'.

2) multiple slices choose eg: I wish I can select the data in multiple slices like the following: ['2013-01-02':'2013-01-03'] & ['2013-01-05':'2013-01-07'] & ['2013-01-09':'2013-01-11'] More specifically, this slice should be same as python slice which means including left but not including right.

matcha latte
  • 185
  • 1
  • 12

1 Answers1

3

Assuming this is the raw data (with a Datetime index)

dft = pd.DataFrame(np.random.randn(100000,1), columns=['A'],
     index=pd.date_range('20130101',periods=100000,freq='T'))
dft.head()
                            A
2013-01-01 00:00:00  0.313644
2013-01-01 00:01:00  0.458860
2013-01-01 00:02:00  0.841434
2013-01-01 00:03:00 -0.135846
2013-01-01 00:04:00 -0.881316

For 1), just use .isin()

myDates = ['2013-01-02', '2013-01-04', '2013-01-06']

# to get data in list
df_in = dft[pd.to_datetime(dft.index.date).isin(myDates)]
df_in.head()
                            A
2013-01-02 00:00:00  0.444005
2013-01-02 00:01:00 -0.073561
2013-01-02 00:02:00  0.256737
2013-01-02 00:03:00  1.304807
2013-01-02 00:04:00 -0.741956

# to get data not in list
df_not_in = dft[~pd.to_datetime(dft.index.date).isin(myDates)]
df_not_in_list.head()
                            A
2013-01-01 00:00:00 -0.944070
2013-01-01 00:01:00  0.225456
2013-01-01 00:02:00  0.571424
2013-01-01 00:03:00 -0.004389
2013-01-01 00:04:00  0.933229

For 2), if I understand correctly, you want to select the data using multiple datetime slices. To do this, you can use multiple index masks, from a nested list, to filter by date

myDates = [['2013-01-02','2013-01-03'],
           ['2013-01-05','2013-01-07'],
           ['2013-01-09','2013-01-11']]
df_masked = dft[
      (dft.index >= myDates[0][0]) & (dft.index <= myDates[0][1]) & \
      (dft.index >= myDates[1][0]) & (dft.index <= myDates[1][1]) & \
      (dft.index >= myDates[2][0]) & (dft.index <= myDates[2][1])
      ]
edesz
  • 11,756
  • 22
  • 75
  • 123
  • Hi WR, I have tested this method before posting the question. It didn't work. – matcha latte Oct 30 '18 at 23:11
  • That is strange. I have used the same sample data you posted and it worked. Your sample data has a `Datetime` index. Check that your real data also has a `Datetime` index in the same format as the sample data. That could be part of the problem.... – edesz Oct 30 '18 at 23:22
  • Also, if you have made an attempt, you should post your work and show the problem/error message. – edesz Oct 30 '18 at 23:29
  • For 1) it only returns 2013-01-02 00:00:00, 2013-01-04 00:00:00, 2013-01-06 00:00:00 data. Sorry I didn't make it clear. I wish I can get all the data in 2013-01-02, 2013-01-04 and 2013-01-06. – matcha latte Oct 30 '18 at 23:39
  • For 2), this solution is what I want. Thanks. – matcha latte Oct 30 '18 at 23:42
  • Thanks for the extra information. Ok, I have updated the answer for 1). That should work now - it will filter out all the hours of the daym not just `00:00:00`. – edesz Oct 31 '18 at 00:38