Keep original data points when padding a signal with pandas

Question

Consider the following test data set:

testdf = pandas.DataFrame({'t': [datetime(2015, 1, 1, 10,  0),
                                 datetime(2015, 1, 1, 11, 32),
                                 datetime(2015, 1, 1, 12,  0)],
                           'val': [1, 2, 3]})

I would like to interpolate this data set using simple padding, such that I have a data point at least every 30 mins, while keeping the original data points.

An appropriate result would look like this:

't'                'val'
2015-01-01 10:00   1
2015-01-01 10:30   1
2015-01-01 11:00   1
2015-01-01 11:30   1
2015-01-01 11:32   2
2015-01-01 12:00   3

Which would be a good way of achieving this result, preferably using standard pandas methods?

I know of the DataFrame.resample method, but

a) I can't seem to find the right values of the how parameter to achieve the desired simple padding, and

b) I can't find a way to keep the original data points in the result.

Problem b) could of course be circumvented by manually adding the original data points to the resampled DataFrame, although I wouldn't call this a particularly neat solution.

At first I thought of using `reindex` with `method='bfill'`, but this does not keep the original data points, hence my proposed answer with `combine_first`. — IanS, Mar 10 '16 at 14:05

score 3 · Accepted Answer · edited Mar 30 '16 at 10:49

Generate an index with the missing timestamps and create a dataframe with NaN values. Then combine it with the combine_first method and fill in the NaN values:

idx = pandas.date_range(datetime(2015, 1, 1, 10, 0), datetime(2015, 1, 1, 12, 0), freq='30min')
df = pandas.DataFrame(numpy.nan, index=idx, columns=['val'])

testdf.set_index('t', inplace=True)
testdf.combine_first(df).fillna(method='ffill')

The documentation of the combine_first method reads:

Combine two DataFrame objects and default to non-null values in frame calling the method. Result index columns will be the union of the respective indexes and columns

The ffill method of the fillna method does the following (source):

ffill: propagate last valid observation forward to next valid backfill

Keep original data points when padding a signal with pandas

1 Answers1

Linked