I'm trying to create a data frame from a start date and end date, for a number of asset_id's and turn it into a list of half-hours for each asset_id between the start and end date with the values of some_property filled forward.
I've tried Grouped and resample from the documentation and examples from SO but am stumped how to get this done.
Consider example:
some_time = datetime(2018,4,2,20,20,42)
start_date = datetime(some_time.year,some_time.month,some_time.day).astimezone(pytz.timezone('Europe/London'))
end_date = start_date + timedelta(days=1)
start_date = start_date + timedelta(hours=some_time.hour,minutes=(0 if some_time.minute < 30 else 30 ))
df = pd.DataFrame(['A','B'],columns=['asset_id'])
df2=df.copy()
df['HH'] = start_date
df2['HH'] = end_date
df['some_property']=0
df.loc[df['asset_id']=='B','some_property']=2
df = df.append(df2)
print(df)
Example df:
HH asset_id some_property
0 2018-04-02 20:00:00+01:00 A 0.0
1 2018-04-02 20:00:00+01:00 B 2.0
0 2018-04-03 00:00:00+01:00 A NaN
1 2018-04-03 00:00:00+01:00 B NaN
My desired df is this:
HH asset_id some_property
0 2018-04-02 20:00:00+01:00 A 0
4 2018-04-02 20:30:00+01:00 A 0
6 2018-04-02 21:00:00+01:00 A 0
...
2 2018-04-03 00:00:00+01:00 A 0
1 2018-04-02 20:00:00+01:00 B 2
5 2018-04-02 20:30:00+01:00 B 2
...
3 2018-04-03 00:00:00+01:00 B 2
example by dumb for-loop as below.
for ii in range(1,((24-some_time.hour)*2)-1):
temp_time = start_date + timedelta(minutes=30*ii)
for jj in ['A','B']:
df2 = df.append({'asset_id':jj,'HH':temp_time,'some_property':(0 if jj=='A' else 2)},ignore_index=True)
df = df2
df.loc[2,'some_property']=0
df.loc[3,'some_property']=2
Can anyone help? I'm trying to create dummy data and having trouble. In reality my dummy data will have multiple rows per asset_id before resampling and ffill-ing.
Update
This might be on the right track:
df2 = df.set_index('HH').groupby('asset_id').resample('30T').ffill().drop('asset_id',axis=1)
gives me:
some_property
asset_id HH
A 2018-04-02 20:00:00+01:00 0.0
2018-04-02 20:30:00+01:00 0.0
2018-04-02 21:00:00+01:00 0.0
...
2018-04-03 00:00:00+01:00 NaN
B 2018-04-02 20:00:00+01:00 2.0
2018-04-02 20:30:00+01:00 2.0
...
2018-04-03 00:00:00+01:00 NaN
I'm happy to have gained a multindex and can set the last datetime manually (or can exclude row actually).
Does this look like a sensible method? If so, question answered!