1

I have a dataframe:

df =      T1           C1
     01/01/2022 11:20   2
     01/01/2022 15:40   8
     01/01/2022 17:50   3

I want to expand it such that

  1. I will have the value in specific given times
  2. I will have a row for each round timestamp

So if the times are given in

l=[ 01/01/2022 15:46 ,  01/01/2022 11:28]

I will have:

df_new =      T1            C1
         01/01/2022 11:20   2
         01/01/2022 11:28   2
         01/01/2022 12:00   2
         01/01/2022 13:00   2
         01/01/2022 14:00   2
         01/01/2022 15:00   2
         01/01/2022 15:40   8
         01/01/2022 15:46   8
         01/01/2022 16:00   8
         01/01/2022 17:00   8
         01/01/2022 17:50   3
Cranjis
  • 1,590
  • 8
  • 31
  • 64
  • I assume that for `T1` values in the input that are "round timestamps", you don't want a new row with duplicate `T1` value in the output? – constantstranger Jan 09 '23 at 21:14

3 Answers3

3

You can add the extra dates and ffill:

df['T1'] = pd.to_datetime(df['T1'])

extra = pd.date_range(df['T1'].min().ceil('H'), df['T1'].max().floor('H'), freq='1h')

(pd.concat([df, pd.DataFrame({'T1': extra})])
   .sort_values(by='T1', ignore_index=True)
   .ffill()
 )

Output:


                   T1   C1
0 2022-01-01 11:20:00  2.0
1 2022-01-01 12:00:00  2.0
2 2022-01-01 13:00:00  2.0
3 2022-01-01 14:00:00  2.0
4 2022-01-01 15:00:00  2.0
5 2022-01-01 15:40:00  8.0
6 2022-01-01 16:00:00  8.0
7 2022-01-01 17:00:00  8.0
8 2022-01-01 17:50:00  3.0
mozway
  • 194,879
  • 13
  • 39
  • 75
0

Another possible solution, based on pandas.DataFrame.resample:

df['T1'] = pd.to_datetime(df['T1'])

(pd.concat([df, df.set_index('T1').resample('1H').asfreq().reset_index()])
 .sort_values('T1').ffill().dropna().reset_index(drop=True))

Output:

                   T1   C1
0 2022-01-01 11:20:00  2.0
1 2022-01-01 12:00:00  2.0
2 2022-01-01 13:00:00  2.0
3 2022-01-01 14:00:00  2.0
4 2022-01-01 15:00:00  2.0
5 2022-01-01 15:40:00  8.0
6 2022-01-01 16:00:00  8.0
7 2022-01-01 17:00:00  8.0
8 2022-01-01 17:50:00  3.0
PaulS
  • 21,159
  • 2
  • 9
  • 26
0

Here is a way to do what your question asks that will ensure:

  • there are no duplicate times in T1 in the output, even if any of the times in the original are round hours
  • the results will be of the same type as the values in the C1 column of the input (in this case, integers not floats).
hours = pd.date_range(df.T1.min().ceil("H"), df.T1.max().floor("H"), freq="60min")
idx_new = df.set_index('T1').join(pd.DataFrame(index=hours), how='outer', sort=True).index
df_new = df.set_index('T1').reindex(index = idx_new, method='ffill').reset_index().rename(columns={'index':'T1'})

Output:

                   T1  C1
0 2022-01-01 11:20:00   2
1 2022-01-01 12:00:00   2
2 2022-01-01 13:00:00   2
3 2022-01-01 14:00:00   2
4 2022-01-01 15:00:00   2
5 2022-01-01 15:40:00   8
6 2022-01-01 16:00:00   8
7 2022-01-01 17:00:00   8
8 2022-01-01 17:50:00   3

Example of how round dates in the input are handled:

df = pd.DataFrame({
#'T1':pd.to_datetime(['01/01/2022 11:20','01/01/2022 15:40','01/01/2022 17:50']),
'T1':pd.to_datetime(['01/01/2022 11:00','01/01/2022 15:40','01/01/2022 17:00']),
'C1':[2,8,3]})

Input:

                   T1  C1
0 2022-01-01 11:00:00   2
1 2022-01-01 15:40:00   8
2 2022-01-01 17:00:00   3

Output (no duplicates):

                   T1  C1
0 2022-01-01 11:00:00   2
1 2022-01-01 12:00:00   2
2 2022-01-01 13:00:00   2
3 2022-01-01 14:00:00   2
4 2022-01-01 15:00:00   2
5 2022-01-01 15:40:00   8
6 2022-01-01 16:00:00   8
7 2022-01-01 17:00:00   3
constantstranger
  • 9,176
  • 2
  • 5
  • 19