3

I have some time series data as:

import pandas as pd    
index = pd.date_range('06/01/2014',periods=24*30,freq='H')
df1 = pd.DataFrame(range(len(index)),index=index)

Now I want to subset data of below dates

selec_dates = ['2014-06-10','2014-06-15','2014-06-20']

I tried following statement but it is not working

sub_data = df1.loc[df1.index.isin(pd.to_datetime(selec_dates))]

Where am I doing wrong? Is there any other approach to subset selected days data?

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
Haroon Lone
  • 2,837
  • 5
  • 29
  • 65

5 Answers5

6

You need compare dates and for test membership use numpy.in1d:

sub_data = df1.loc[np.in1d(df1.index.date, pd.to_datetime(selec_dates).date)]
print (sub_data)
                      a
2014-06-10 00:00:00  216
2014-06-10 01:00:00  217
2014-06-10 02:00:00  218
2014-06-10 03:00:00  219
2014-06-10 04:00:00  220
2014-06-10 05:00:00  221
2014-06-10 06:00:00  222
2014-06-10 07:00:00  223
2014-06-10 08:00:00  224
2014-06-10 09:00:00  225
2014-06-10 10:00:00  226
...

If want use isin, is necessary create Series with same index:

sub_data = df1.loc[pd.Series(df1.index.date, index=df1.index)
                     .isin(pd.to_datetime(selec_dates).date)]
print (sub_data)
                       a
2014-06-10 00:00:00  216
2014-06-10 01:00:00  217
2014-06-10 02:00:00  218
2014-06-10 03:00:00  219
2014-06-10 04:00:00  220
2014-06-10 05:00:00  221
2014-06-10 06:00:00  222
2014-06-10 07:00:00  223
2014-06-10 08:00:00  224
2014-06-10 09:00:00  225
2014-06-10 10:00:00  226
2014-06-10 11:00:00  227
...
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
3

I'm sorry and misunderstood your question

df1[pd.Series(df1.index.date, index=df1.index).isin(pd.to_datetime(selec_dates).date)]

Should perform what was needed

original answer

Please check the pandas documentation on selection

You can easily do

sub_data = df1.loc[pd.to_datetime(selec_dates)]
Maarten Fabré
  • 6,938
  • 1
  • 17
  • 36
2

Edit: I have been made aware this only works if you are working with a daterange in the same month and year as in your query. For a more general (and better answer) see @jezrael solution.

You can use np.in1d and .day on your index if you wanted to do it as you tried:

selec_dates = ['2014-06-10','2014-06-15','2014-06-20']

df1.loc[np.in1d(df1.index.day, (pd.to_datetime(selec_dates).day))]

This gives you as you require:

2014-06-10 00:00:00  216
2014-06-10 01:00:00  217
2014-06-10 02:00:00  218
2014-06-10 03:00:00  219
2014-06-10 04:00:00  220
2014-06-10 05:00:00  221
2014-06-10 06:00:00  222
2014-06-10 07:00:00  223
2014-06-10 08:00:00  224
2014-06-10 09:00:00  225
2014-06-10 10:00:00  226
2014-06-10 11:00:00  227
2014-06-10 12:00:00  228
2014-06-10 13:00:00  229
2014-06-10 14:00:00  230
2014-06-10 15:00:00  231
2014-06-10 16:00:00  232
2014-06-10 17:00:00  233
2014-06-10 18:00:00  234
2014-06-10 19:00:00  235
2014-06-10 20:00:00  236
2014-06-10 21:00:00  237
2014-06-10 22:00:00  238
2014-06-10 23:00:00  239
2014-06-15 00:00:00  336
2014-06-15 01:00:00  337
2014-06-15 02:00:00  338
2014-06-15 03:00:00  339
2014-06-15 04:00:00  340
2014-06-15 05:00:00  341
                 ...
2014-06-15 18:00:00  354
2014-06-15 19:00:00  355
2014-06-15 20:00:00  356
2014-06-15 21:00:00  357
2014-06-15 22:00:00  358
2014-06-15 23:00:00  359
2014-06-20 00:00:00  456
2014-06-20 01:00:00  457
2014-06-20 02:00:00  458
2014-06-20 03:00:00  459
2014-06-20 04:00:00  460
2014-06-20 05:00:00  461
2014-06-20 06:00:00  462
2014-06-20 07:00:00  463
2014-06-20 08:00:00  464
2014-06-20 09:00:00  465
2014-06-20 10:00:00  466
2014-06-20 11:00:00  467
2014-06-20 12:00:00  468
2014-06-20 13:00:00  469
2014-06-20 14:00:00  470
2014-06-20 15:00:00  471
2014-06-20 16:00:00  472
2014-06-20 17:00:00  473
2014-06-20 18:00:00  474
2014-06-20 19:00:00  475
2014-06-20 20:00:00  476
2014-06-20 21:00:00  477
2014-06-20 22:00:00  478
2014-06-20 23:00:00  479

[72 rows x 1 columns]

I used these Sources for this answer:
- Selecting a subset of a Pandas DataFrame indexed by DatetimeIndex with a list of TimeStamps
- In Python-Pandas, How can I subset a dataframe by specific datetime index values?
- return pandas DF column with the number of days elapsed between index and today's date
- Get weekday/day-of-week for Datetime column of DataFrame
- https://stackoverflow.com/a/36893416/2254228

Community
  • 1
  • 1
Chuck
  • 3,664
  • 7
  • 42
  • 76
  • @jezrael You can see as I used `.day` and you used `.date` Genuinely didn't see you had posted when I changed the answer. Just wouldn't want you to think I'd try and say your excellence is my own without providing a source! :) Sorry for the mixup jez. – Chuck May 12 '17 at 11:01
  • @jezrael I think it works, because the year and month is the same, as it is only the date that changes (and is his index is all in the same month/year). If in `selec_dates` there were different days over different months or different years, it would be wrong. Does that make sense? As usual though, your answer is much better for the general case of any possible date! – Chuck May 12 '17 at 11:05
  • And congrat to 1k ;) – jezrael May 12 '17 at 11:15
  • @jezrael !! :D :D I got there finally haha. Thanks for saying congrats! ;) – Chuck May 12 '17 at 11:16
2

You can use .query() method:

In [202]: df1.query('@index.normalize() in @selec_dates')
Out[202]:
                       0
2014-06-10 00:00:00  216
2014-06-10 01:00:00  217
2014-06-10 02:00:00  218
2014-06-10 03:00:00  219
2014-06-10 04:00:00  220
2014-06-10 05:00:00  221
2014-06-10 06:00:00  222
2014-06-10 07:00:00  223
2014-06-10 08:00:00  224
2014-06-10 09:00:00  225
...                  ...
2014-06-20 14:00:00  470
2014-06-20 15:00:00  471
2014-06-20 16:00:00  472
2014-06-20 17:00:00  473
2014-06-20 18:00:00  474
2014-06-20 19:00:00  475
2014-06-20 20:00:00  476
2014-06-20 21:00:00  477
2014-06-20 22:00:00  478
2014-06-20 23:00:00  479

[72 rows x 1 columns]
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
1

Use the string repr of the date, leaving out the time periods in the day.

pd.concat([df1['2014-06-10'] , df1['2014-06-15'], df1['2014-06-20']])
Meitham
  • 9,178
  • 5
  • 34
  • 45