2

I have a simple data frame (data from the Tropical Rainfall Measuring Mission, TRMM, in case that helps provide context), one column for the datetime and one for precipitation measurement, that looks like this:

                        ppt
date            

1998-01-01 03:00:00     0.00    
1998-01-01 06:00:00     0.00    
1998-01-01 09:00:00     0.03    
1998-01-01 12:00:00     0.20

The readings are every three hours and the values are 3-hour averages of rainfall per hour for the previous three hours. I would like to create a dataframe that contains rainfall measurements for every hour, so it would look like this:

                        ppt
date            
1998-01-01 01:00:00     0.00
1998-01-01 02:00:00     0.00    
1998-01-01 03:00:00     0.00
1998-01-01 04:00:00     0.00
1998-01-01 05:00:00     0.00    
1998-01-01 06:00:00     0.00
1998-01-01 07:00:00     0.03
1998-01-01 08:00:00     0.03    
1998-01-01 09:00:00     0.03
1998-01-01 10:00:00     0.20
1998-01-01 11:00:00     0.20    
1998-01-01 12:00:00     0.20    

Any ideas of how I might go about doing this?

SHV_la
  • 875
  • 1
  • 10
  • 14

2 Answers2

1

IIUC

In order to get the ones above:

# repeated decreasing number of hours
# [2 hr, 1 hr, 0 hr, 2 hr, 1 hr, 0 hr, ...]
d = np.tile(np.arange(3)[::-1], len(df)) * pd.Timedelta(1, unit='H')

# repeat the index 3 times for every entry
# [3:00, 3:00, 3:00, 6:00, 6:00, 6:00, ...]
i = df.index.repeat(3)
df_ = df.loc[i]

# take care of differences
# [3:00, 3:00, 3:00, 6:00, 6:00, 6:00, ...]
#  minus
# [2 hr, 1 hr, 0 hr, 2 hr, 1 hr, 0 hr, ...]
# [1:00, 2:00, 3:00, 4:00, 5:00, 6:00, ...]
df_.index -= d

df_

                      ppt
date                     
1998-01-01 01:00:00  0.00
1998-01-01 02:00:00  0.00
1998-01-01 03:00:00  0.00
1998-01-01 04:00:00  0.00
1998-01-01 05:00:00  0.00
1998-01-01 06:00:00  0.00
1998-01-01 07:00:00  0.03
1998-01-01 08:00:00  0.03
1998-01-01 09:00:00  0.03
1998-01-01 10:00:00  0.20
1998-01-01 11:00:00  0.20
1998-01-01 12:00:00  0.20

asfreq and resample

Only get you this far

df.asfreq('H').bfill()

                      ppt
date                     
1998-01-01 03:00:00  0.00
1998-01-01 04:00:00  0.00
1998-01-01 05:00:00  0.00
1998-01-01 06:00:00  0.00
1998-01-01 07:00:00  0.03
1998-01-01 08:00:00  0.03
1998-01-01 09:00:00  0.03
1998-01-01 10:00:00  0.20
1998-01-01 11:00:00  0.20
1998-01-01 12:00:00  0.20

We're missing the

1998-01-01 01:00:00  0.00
1998-01-01 02:00:00  0.00

in the beginning

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks! To assist my understanding of what's going on there, could you add some comments? – SHV_la May 22 '19 at 17:59
  • @SHV_la see if that helps. – piRSquared May 22 '19 at 18:00
  • Hi Sir, Could you please adding this(link) type of question in Pivot Q&A, I have been seen this type of question more than 10 times .https://stackoverflow.com/questions/56263686/converting-rows-from-one-pandas-dataframe-column-to-multiple-columns-without-num – BENY May 22 '19 at 19:42
  • @piRSquared if the sequence started with 1998-01-01 00:00:00, presumably that backfill would work with asfreq? – SHV_la May 22 '19 at 19:44
  • @piRSquared yes, adding the above timestamp does work; thanks so so much! – SHV_la May 22 '19 at 19:47
0

You can use resample with a back-fill, provided you specify the start time correctly:

import pandas as pd
import numpy as np


#specify start and end times so that the range to fill is clear
start = pd.Timestamp('1998-01-01 00:00:00')
end = pd.Timestamp('1998-01-01 12:00:00')
t = np.linspace(start.value, end.value, 5)
t = pd.to_datetime(t)
df=pd.DataFrame(index=t)

#populate existing values
df['ppt']=[0.,0.,0.,0.03,0.2]

#resample and fill backwards
df.resample('1H').bfill()
HMReliable
  • 871
  • 5
  • 11