Starting with something like this:
from pandas import DataFrame
time = np.array(('2015-08-01T00:00:00','2015-08-01T12:00:00'),dtype='datetime64[ns]')
heat_index = np.array([101,103])
air_temperature = np.array([96,95])
df = DataFrame({'heat_index':heat_index,'air_temperature':air_temperature},index=time)
yielding this for df
:
air_temperature heat_index
2015-08-01 07:00:00 96 101
2015-08-01 19:00:00 95 103
then resample daily:
df_daily = df.resample('24H',how='max')
To get this for df_daily
:
air_temperature heat_index
2015-08-01 96 103
So by resampling using how='max'
pandas resamples each 24 hour period, taking the maximum value within that period from each column.
But as you can see looking at df
output for 2015-08-01
, that day's maximum heat index (which occurs at 19:00:00
) does not correlate with air temperature occurred at the same time. That is, the heat index of 103F was caused with an air temperature of 95F. This association is lost through resampling, and we end up looking at the air temperature from a different part of the day.
Is there a way to resample just one column, and preserve the value in another column at the same index? So that the final outcome would look like this:
air_temperature heat_index
2015-08-01 95 103
My first guess is to just resample the heat_index
column...
df_daily = df.resample('24H',how={'heat_index':'max'})
to get...
air_temperature
2015-08-01 103
...and then trying to do some sort of DataFrame.loc or DataFrame.ix from there, but have been unsuccessful. Any thoughts on how to find the related value after resampling (e.g. to find the air_temperature
that occurred at the same time as what is later found to be the maximum heat_index
)?