5

my pandas version is 0.18 and I have a minute data that looks as follows:

Time                              
2009-01-30 09:30:00  85.11  100.11
2009-01-30 09:39:00  84.93  100.05
2009-01-30 09:40:00  84.90  100.00
2009-01-30 09:45:00  84.91   99.94
2009-01-30 09:48:00  84.81   99.90
2009-01-30 09:55:00  84.78  100.00
2009-01-30 09:56:00  84.57  100.10
2009-01-30 09:59:00  84.25  100.41
2009-01-30 10:00:00  84.32  100.60
2009-01-30 10:06:00  84.23  101.49
2009-01-30 10:09:00  84.15  101.47

I want to use data only from 9:30 and 16:00 and resample the data in 78 min intervals (i.e divide the time between 9:30 and 16:00 into 5 equal parts). My code looks as follows:

Data= Data.between_time('9:30','16:00')
tframe = '78T'
hourlym = Data.resample(tframe, base=30).mean()

The output:

Time                                      
2009-01-30 08:18:00  85.110000  100.110000
2009-01-30 09:36:00  83.950645  101.984516
2009-01-30 10:54:00  83.372294  103.093824
2009-01-30 12:12:00  83.698624  102.566897
2009-01-30 13:30:00  83.224397  103.076667
2009-01-30 14:48:00  82.641167  104.114667
2009-01-30 16:06:00        NaN         NaN
2009-01-30 17:24:00        NaN         NaN
2009-01-30 18:42:00        NaN         NaN

As you can see pandas ignores my base parameter and my output table starts from 8:18, I believe this is because pandas seeks how to properly split my whole data into 78 minute and since you cannot divide 24h into 78 mins equally this weird behavior occurs. Is it possible to force pandas to start resampling specifically from 9:30 on the 1st day? Or work only with specific hours while reampling?

Paul H
  • 65,268
  • 20
  • 159
  • 136
kroonike
  • 1,109
  • 2
  • 13
  • 20

2 Answers2

5

The base argument is applied to midnight, so in your case the sampling starts from 00:30 and adds 78 min increments from there. I see two options.

Option 1:

Figure out what the base applied to midnight should be in order to reach 9:30 (in this case 24):

Data.resample(tframe, base=24)

Option 2:

Generate the datetimeindex yourself, and resample with reindex:

index = pd.date_range('2009-01-30 09:30:00', '2009-01-30 16:00:00', freq='78min')
Data.reindex(index=index)

EDIT: for multiple days you will need to generate the timestamps yourself.

index_date = pd.date_range('2016-04-01', '2016-04-04')
index_date = pd.Series(index_date)
index_time = pd.date_range('09:30:00', '16:00:00', freq='78min')
index_time = pd.Series(index_time.time)

index = index_date.apply(
    lambda d: index_time.apply(
        lambda t: datetime.combine(d, t)
        )
    ).unstack().sort_values().reset_index(drop=True)

Here is what the code does:

  • Generate the dates and times you're interested in, and make them into series to have the apply property.
  • Using nested 'applies', loop over dates and times and combine them into a datetime object.
  • The output is a square dataframe (one column per date) so I unstack and sort the timestamps (and finally reset the index to get rid of a useless index generated along the way).

The resulting index can be used to reindex as in option 2 originally:

Data.reindex(index=index)
IanS
  • 15,771
  • 9
  • 60
  • 84
  • The problem with those two options that 24 hours, cannot be equally divided by 78 minutes, so my data starts to shift after day one (but its possible to divide the 9:30 to 16:00 data into 78 minute chunks). I want my data to have an index with exactly 5 times every day and ignore time outside 16:00 and 9:30 – kroonike May 03 '16 at 12:57
  • OK, multiple days, got it. – IanS May 03 '16 at 13:26
  • Yes, the code works! I wonder if it possible now to resample my initial data with custom function and the new index? Also, I would be really grateful if you shortly explained the logic behind the code. – kroonike May 04 '16 at 12:29
  • I'll add explanations. What custom function are you referring to? – IanS May 04 '16 at 13:00
  • 1
    I changed the code a bit to make it easier to explain. Please check. – IanS May 04 '16 at 13:20
2

Answer given by IanS is quite complete.

As I noticed the dates in your problem, I think you are trying these operations in Indian trading scenario. So, it would be better if you create your custom calendar using pd.tseries.offsets.CustomBusinessDay(holidays=holiday_list): https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html?highlight=holiday#custom-business-days

Let's say from 1980-2025 (only for B-days()) + remove(list of trading holidays from 1980-2025(this you could google it).

This will help you if you are doing paper-trading backtesting and importing data from many stocks.


Or you could install package: pip install pandas_market_calendars

import pandas_market_calendars as mcal

# Create a calendar
nyse = mcal.get_calendar('NYSE')

# Show available calendars
print(mcal.get_calendar_names())

early = nyse.schedule(start_date='2012-07-01', end_date='2012-07-10')
early

                  market_open             market_close
=========== ========================= =========================
 2012-07-02 2012-07-02 13:30:00+00:00 2012-07-02 20:00:00+00:00
 2012-07-03 2012-07-03 13:30:00+00:00 2012-07-03 17:00:00+00:00
 2012-07-05 2012-07-05 13:30:00+00:00 2012-07-05 20:00:00+00:00

mcal.date_range(early, frequency='1D')

DatetimeIndex(['2012-07-02 20:00:00+00:00', '2012-07-03 17:00:00+00:00',
               '2012-07-05 20:00:00+00:00', '2012-07-06 20:00:00+00:00',
               '2012-07-09 20:00:00+00:00', '2012-07-10 20:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

mcal.date_range(early, frequency='1H')

DatetimeIndex(['2012-07-02 14:30:00+00:00', '2012-07-02 15:30:00+00:00',
               '2012-07-02 16:30:00+00:00', '2012-07-02 17:30:00+00:00',
               '2012-07-02 18:30:00+00:00', '2012-07-02 19:30:00+00:00',
               '2012-07-02 20:00:00+00:00', '2012-07-03 14:30:00+00:00',

After that just do

Data.reindex(index=name_of_date_object_list)
fantabolous
  • 21,470
  • 7
  • 54
  • 51
yashu vishnalia
  • 611
  • 7
  • 7