Pandas dataframe expand rows in specific times

Question

I have a dataframe:

df =      T1           C1
     01/01/2022 11:20   2
     01/01/2022 15:40   8
     01/01/2022 17:50   3

I want to expand it such that

I will have the value in specific given times
I will have a row for each round timestamp

So if the times are given in

l=[ 01/01/2022 15:46 ,  01/01/2022 11:28]

I will have:

df_new =      T1            C1
         01/01/2022 11:20   2
         01/01/2022 11:28   2
         01/01/2022 12:00   2
         01/01/2022 13:00   2
         01/01/2022 14:00   2
         01/01/2022 15:00   2
         01/01/2022 15:40   8
         01/01/2022 15:46   8
         01/01/2022 16:00   8
         01/01/2022 17:00   8
         01/01/2022 17:50   3

I assume that for `T1` values in the input that are "round timestamps", you don't want a new row with duplicate `T1` value in the output? — constantstranger, Jan 09 '23 at 21:14

score 3 · Answer 1 · answered Jan 09 '23 at 20:36

You can add the extra dates and ffill:

df['T1'] = pd.to_datetime(df['T1'])

extra = pd.date_range(df['T1'].min().ceil('H'), df['T1'].max().floor('H'), freq='1h')

(pd.concat([df, pd.DataFrame({'T1': extra})])
   .sort_values(by='T1', ignore_index=True)
   .ffill()
 )

Output:


                   T1   C1
0 2022-01-01 11:20:00  2.0
1 2022-01-01 12:00:00  2.0
2 2022-01-01 13:00:00  2.0
3 2022-01-01 14:00:00  2.0
4 2022-01-01 15:00:00  2.0
5 2022-01-01 15:40:00  8.0
6 2022-01-01 16:00:00  8.0
7 2022-01-01 17:00:00  8.0
8 2022-01-01 17:50:00  3.0

please see edit can you fix accordingly? – Cranjis Jan 10 '23 at 07:43 — Cranjis, Jan 10 '23 at 07:43

PaulS · Answer 2 · 2023-01-11T11:06:34.283

0

Another possible solution, based on pandas.DataFrame.resample:

df['T1'] = pd.to_datetime(df['T1'])

(pd.concat([df, df.set_index('T1').resample('1H').asfreq().reset_index()])
 .sort_values('T1').ffill().dropna().reset_index(drop=True))

Output:

                   T1   C1
0 2022-01-01 11:20:00  2.0
1 2022-01-01 12:00:00  2.0
2 2022-01-01 13:00:00  2.0
3 2022-01-01 14:00:00  2.0
4 2022-01-01 15:00:00  2.0
5 2022-01-01 15:40:00  8.0
6 2022-01-01 16:00:00  8.0
7 2022-01-01 17:00:00  8.0
8 2022-01-01 17:50:00  3.0

edited Jan 11 '23 at 11:06

answered Jan 09 '23 at 21:06

PaulS

21,159
2
9
26

please see edit can you fix accordingly? – Cranjis Jan 10 '23 at 07:43

constantstranger · Answer 3 · 2023-01-09T21:20:29.577

Here is a way to do what your question asks that will ensure:

there are no duplicate times in T1 in the output, even if any of the times in the original are round hours
the results will be of the same type as the values in the C1 column of the input (in this case, integers not floats).

hours = pd.date_range(df.T1.min().ceil("H"), df.T1.max().floor("H"), freq="60min")
idx_new = df.set_index('T1').join(pd.DataFrame(index=hours), how='outer', sort=True).index
df_new = df.set_index('T1').reindex(index = idx_new, method='ffill').reset_index().rename(columns={'index':'T1'})

Output:

                   T1  C1
0 2022-01-01 11:20:00   2
1 2022-01-01 12:00:00   2
2 2022-01-01 13:00:00   2
3 2022-01-01 14:00:00   2
4 2022-01-01 15:00:00   2
5 2022-01-01 15:40:00   8
6 2022-01-01 16:00:00   8
7 2022-01-01 17:00:00   8
8 2022-01-01 17:50:00   3

Example of how round dates in the input are handled:

df = pd.DataFrame({
#'T1':pd.to_datetime(['01/01/2022 11:20','01/01/2022 15:40','01/01/2022 17:50']),
'T1':pd.to_datetime(['01/01/2022 11:00','01/01/2022 15:40','01/01/2022 17:00']),
'C1':[2,8,3]})

Input:

                   T1  C1
0 2022-01-01 11:00:00   2
1 2022-01-01 15:40:00   8
2 2022-01-01 17:00:00   3

Output (no duplicates):

                   T1  C1
0 2022-01-01 11:00:00   2
1 2022-01-01 12:00:00   2
2 2022-01-01 13:00:00   2
3 2022-01-01 14:00:00   2
4 2022-01-01 15:00:00   2
5 2022-01-01 15:40:00   8
6 2022-01-01 16:00:00   8
7 2022-01-01 17:00:00   3

please see edit can you fix accordingly? – Cranjis Jan 10 '23 at 07:44 — Cranjis, Jan 10 '23 at 07:44

Pandas dataframe expand rows in specific times

3 Answers3