Pandas: is there a way to 1) add two new empty rows above every row in a dataframe, and 2) fill rows with same value?

Question

I have a simple data frame (data from the Tropical Rainfall Measuring Mission, TRMM, in case that helps provide context), one column for the datetime and one for precipitation measurement, that looks like this:

                        ppt
date            

1998-01-01 03:00:00     0.00    
1998-01-01 06:00:00     0.00    
1998-01-01 09:00:00     0.03    
1998-01-01 12:00:00     0.20

The readings are every three hours and the values are 3-hour averages of rainfall per hour for the previous three hours. I would like to create a dataframe that contains rainfall measurements for every hour, so it would look like this:

                        ppt
date            
1998-01-01 01:00:00     0.00
1998-01-01 02:00:00     0.00    
1998-01-01 03:00:00     0.00
1998-01-01 04:00:00     0.00
1998-01-01 05:00:00     0.00    
1998-01-01 06:00:00     0.00
1998-01-01 07:00:00     0.03
1998-01-01 08:00:00     0.03    
1998-01-01 09:00:00     0.03
1998-01-01 10:00:00     0.20
1998-01-01 11:00:00     0.20    
1998-01-01 12:00:00     0.20

Any ideas of how I might go about doing this?

Look into `resample` or `asfreq`. – cs95 May 22 '19 at 17:38 — cs95, May 22 '19 at 17:38

piRSquared · Accepted Answer · 2019-05-22T18:04:56.473

1

IIUC

In order to get the ones above:

# repeated decreasing number of hours
# [2 hr, 1 hr, 0 hr, 2 hr, 1 hr, 0 hr, ...]
d = np.tile(np.arange(3)[::-1], len(df)) * pd.Timedelta(1, unit='H')

# repeat the index 3 times for every entry
# [3:00, 3:00, 3:00, 6:00, 6:00, 6:00, ...]
i = df.index.repeat(3)
df_ = df.loc[i]

# take care of differences
# [3:00, 3:00, 3:00, 6:00, 6:00, 6:00, ...]
#  minus
# [2 hr, 1 hr, 0 hr, 2 hr, 1 hr, 0 hr, ...]
# [1:00, 2:00, 3:00, 4:00, 5:00, 6:00, ...]
df_.index -= d

df_

                      ppt
date                     
1998-01-01 01:00:00  0.00
1998-01-01 02:00:00  0.00
1998-01-01 03:00:00  0.00
1998-01-01 04:00:00  0.00
1998-01-01 05:00:00  0.00
1998-01-01 06:00:00  0.00
1998-01-01 07:00:00  0.03
1998-01-01 08:00:00  0.03
1998-01-01 09:00:00  0.03
1998-01-01 10:00:00  0.20
1998-01-01 11:00:00  0.20
1998-01-01 12:00:00  0.20

`asfreq` and `resample`

Only get you this far

df.asfreq('H').bfill()

                      ppt
date                     
1998-01-01 03:00:00  0.00
1998-01-01 04:00:00  0.00
1998-01-01 05:00:00  0.00
1998-01-01 06:00:00  0.00
1998-01-01 07:00:00  0.03
1998-01-01 08:00:00  0.03
1998-01-01 09:00:00  0.03
1998-01-01 10:00:00  0.20
1998-01-01 11:00:00  0.20
1998-01-01 12:00:00  0.20

We're missing the

1998-01-01 01:00:00  0.00
1998-01-01 02:00:00  0.00

in the beginning

edited May 22 '19 at 18:04

answered May 22 '19 at 17:56

piRSquared

285,575
57
475
624

Thanks! To assist my understanding of what's going on there, could you add some comments? – SHV_la May 22 '19 at 17:59
@SHV_la see if that helps. – piRSquared May 22 '19 at 18:00
Hi Sir, Could you please adding this(link) type of question in Pivot Q&A, I have been seen this type of question more than 10 times .https://stackoverflow.com/questions/56263686/converting-rows-from-one-pandas-dataframe-column-to-multiple-columns-without-num – BENY May 22 '19 at 19:42
@piRSquared if the sequence started with 1998-01-01 00:00:00, presumably that backfill would work with asfreq? – SHV_la May 22 '19 at 19:44
@piRSquared yes, adding the above timestamp does work; thanks so so much! – SHV_la May 22 '19 at 19:47

score 0 · Answer 2 · answered May 22 '19 at 18:11

You can use resample with a back-fill, provided you specify the start time correctly:

import pandas as pd
import numpy as np


#specify start and end times so that the range to fill is clear
start = pd.Timestamp('1998-01-01 00:00:00')
end = pd.Timestamp('1998-01-01 12:00:00')
t = np.linspace(start.value, end.value, 5)
t = pd.to_datetime(t)
df=pd.DataFrame(index=t)

#populate existing values
df['ppt']=[0.,0.,0.,0.03,0.2]

#resample and fill backwards
df.resample('1H').bfill()

Pandas: is there a way to 1) add two new empty rows above every row in a dataframe, and 2) fill rows with same value?

2 Answers2

IIUC

asfreq and resample

`asfreq` and `resample`