1

I have the following data of daily pricing:

2017-06-01  15.00
2017-06-02  20.00

I'd like to resample it to hourly prices for over 35 hours. So the first 24h will have a value of 15.00 at every sample and from 24h to 35h the price will be at 20.00.

2017-06-01 00:00    15.00
2017-06-01 01:00    15.00
2017-06-01 02:00    15.00
…
2017-06-01 23:00    15.00
2017-06-02 00:00    20.00
2017-06-02 01:00    20.00
2017-06-02 02:00    20.00
…
2017-06-02 10:00    20.00

I tried with resample('3600S').pad() but it will not work. Is it possible to create the new data range separately and use it as input for a resampling function? resample() doesn't seem to do the job here.

Cœur
  • 37,241
  • 25
  • 195
  • 267
programmar
  • 594
  • 6
  • 19

2 Answers2

0

You can create custom range of dates at hourly freq and reindex

df.index = pd.to_datetime(df.index)
rng=pd.date_range(start=df.index.min(), periods=35, freq='H')
df.reindex(rng).ffill()

                    val
2017-06-01 00:00:00 15.0
2017-06-01 01:00:00 15.0
2017-06-01 02:00:00 15.0
2017-06-01 03:00:00 15.0
2017-06-01 04:00:00 15.0
2017-06-01 05:00:00 15.0
2017-06-01 06:00:00 15.0
2017-06-01 07:00:00 15.0
2017-06-01 08:00:00 15.0
2017-06-01 09:00:00 15.0
2017-06-01 10:00:00 15.0
2017-06-01 11:00:00 15.0
2017-06-01 12:00:00 15.0
2017-06-01 13:00:00 15.0
2017-06-01 14:00:00 15.0
2017-06-01 15:00:00 15.0
2017-06-01 16:00:00 15.0
2017-06-01 17:00:00 15.0
2017-06-01 18:00:00 15.0
2017-06-01 19:00:00 15.0
2017-06-01 20:00:00 15.0
2017-06-01 21:00:00 15.0
2017-06-01 22:00:00 15.0
2017-06-01 23:00:00 15.0
2017-06-02 00:00:00 20.0
2017-06-02 01:00:00 20.0
2017-06-02 02:00:00 20.0
2017-06-02 03:00:00 20.0
2017-06-02 04:00:00 20.0
2017-06-02 05:00:00 20.0
2017-06-02 06:00:00 20.0
2017-06-02 07:00:00 20.0
2017-06-02 08:00:00 20.0
2017-06-02 09:00:00 20.0
2017-06-02 10:00:00 20.0
Vaishali
  • 37,545
  • 5
  • 58
  • 86
0

Another way to do this is to (a)resample without aggregation, (b) calculate the row-wise hourly difference and then (c) use np.where to conditionally set the value column

Sample data

d = {'date':['2017-06-01','2017-06-02', '2017-06-03'], 'value':[15,20,30]}
df = pd.DataFrame.from_dict(d)
print(df)

         date  value
0  2017-06-01     15
1  2017-06-02     20
2  2017-06-03     30

Code

from numpy import where, timedelta64
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date').asfreq("H").iloc[:35,:]
# Get time difference in hours, relative to 1st row
df['hours'] = ((df.index - df.index[0])/timedelta64(1, 'h')).astype(int)
# Conditionally set 'value' column, using time difference
df['value'] = where(df['hours']<35, 15, 20)
print(df)

Output

                     value  hours
date                             
2017-06-01 00:00:00     15      0
2017-06-01 01:00:00     15      1
2017-06-01 02:00:00     15      2
2017-06-01 03:00:00     15      3
2017-06-01 04:00:00     15      4
2017-06-01 05:00:00     15      5
2017-06-01 06:00:00     15      6
2017-06-01 07:00:00     15      7
2017-06-01 08:00:00     15      8
2017-06-01 09:00:00     15      9
2017-06-01 10:00:00     15     10
2017-06-01 11:00:00     15     11
2017-06-01 12:00:00     15     12
2017-06-01 13:00:00     15     13
2017-06-01 14:00:00     15     14
2017-06-01 15:00:00     15     15
2017-06-01 16:00:00     15     16
2017-06-01 17:00:00     15     17
2017-06-01 18:00:00     15     18
2017-06-01 19:00:00     15     19
2017-06-01 20:00:00     15     20
2017-06-01 21:00:00     15     21
2017-06-01 22:00:00     15     22
2017-06-01 23:00:00     15     23
2017-06-02 00:00:00     15     24
2017-06-02 01:00:00     15     25
2017-06-02 02:00:00     15     26
2017-06-02 03:00:00     15     27
2017-06-02 04:00:00     15     28
2017-06-02 05:00:00     15     29
2017-06-02 06:00:00     15     30
2017-06-02 07:00:00     15     31
2017-06-02 08:00:00     15     32
2017-06-02 09:00:00     15     33
2017-06-02 10:00:00     15     34
2017-06-02 11:00:00     20     35

EDIT

Instead of

df = df.set_index('date').asfreq("H").iloc[:35,:]

you can also use

df = df.set_index('date').asfreq("H")
df = df.loc[pd.date_range(start=df.index[0], periods=35, freq='H'),['value']]
edesz
  • 11,756
  • 22
  • 75
  • 123