Pandas: Generating datetime between two times and pass the result as new rows in the data frame

Question

I am working on a data frame df with the following structure:

   start_time            end_time                key            vol
0  2018-08-23 00:00:00   2018-08-23 01:30:00     abcd_eg        0.92
1  2018-08-23 00:15:00   2018-08-23 01:45:00     defg_x2        0.27

I am trying to generate a 15 minute interval between a start and an end time. I would like to have to result as new rows in the same dataframe (or a new dataframe) as follows:

 start_time            end_time                  key             vol
0  2018-08-23 00:00:00   2018-08-23 01:30:00     abcd_eg         0.92
1  2018-08-23 00:15:00   2018-08-23 01:30:00     abcd_eg         0.92
2  2018-08-23 00:30:00   2018-08-23 01:30:00     abcd_eg         0.92
3  2018-08-23 00:45:00   2018-08-23 01:30:00     abcd_eg         0.92
4  2018-08-23 01:00:00   2018-08-23 01:30:00     abcd_eg         0.92
5  2018-08-23 01:15:00   2018-08-23 01:30:00     abcd_eg         0.92
6  2018-08-23 01:30:00   2018-08-23 01:30:00     abcd_eg         0.92
7  2018-08-23 00:15:00   2018-08-23 01:45:00     defg_x2         0.27
8  2018-08-23 00:30:00   2018-08-23 01:45:00     defg_x2         0.27
9  2018-08-23 00:45:00   2018-08-23 01:45:00     defg_x2         0.27
10 2018-08-23 01:00:00   2018-08-23 01:45:00     defg_x2         0.27
11 2018-08-23 01:15:00   2018-08-23 01:45:00     defg_x2         0.27
12 2018-08-23 01:30:00   2018-08-23 01:45:00     defg_x2         0.27
13 2018-08-23 01:45:00   2018-08-23 01:45:00     defg_x2         0.27

The date columns are of type datetime[64], key is object and vol is float. What I tried till now is:

b=[]
lst = []
for i, row in df.iterrows():
b = pd.date_range(start=row.start_time, end=row.end_time, freq='15min',closed=None)
lst.append(b)

Used .iterrows() as I have about 125 records. This gives me the timeseries as DatetimeIndex for all the values of start and the end time in the dataframe, with 15 minute intervals.

After this, I tried to pass lst as a new column unpack in the dataframe df as follows:

df['unpack'] = lst

My thoughts were that if I could get these values as the new column in df, I could use this solution to extract them as rows. But this process is not working.

How can I do this with pandas?

score 1 · Accepted Answer · answered Oct 03 '19 at 23:31

My fifty cents:

First recreate example df:

df = pd.DataFrame({"start_time": [datetime(2018, 8, 23), datetime(2018, 8, 23, 0, 15)],
                   "end_time": [datetime(2018, 8, 23, 1, 30), datetime(2018, 8, 23, 1, 45)],
                   "key": ["abcd_eg", "defg_x2"],
                   "vol": [0.92, 0.27]})

Loop over start times, create a new dataframe for each start_time with the desired index and store them in a list.

dfs = []
for row in df.itertuples():
    part_df = pd.DataFrame(index=pd.DatetimeIndex(start=row.start_time, end=row.end_time, freq='15T'), 
                           data={'end_time': row.end_time, 'key': row.key, 'vol': row.vol})
    part_df.index.name = 'start_time'
    dfs.append(part_df)

Now concatenate all dataframes and reset the index:

result = pd.concat(dfs).reset_index()

giving the following result:

    start_time          end_time            key     vol
0   2018-08-23 00:00:00 2018-08-23 01:30:00 abcd_eg 0.92
1   2018-08-23 00:15:00 2018-08-23 01:30:00 abcd_eg 0.92
2   2018-08-23 00:30:00 2018-08-23 01:30:00 abcd_eg 0.92
3   2018-08-23 00:45:00 2018-08-23 01:30:00 abcd_eg 0.92
4   2018-08-23 01:00:00 2018-08-23 01:30:00 abcd_eg 0.92
5   2018-08-23 01:15:00 2018-08-23 01:30:00 abcd_eg 0.92
6   2018-08-23 01:30:00 2018-08-23 01:30:00 abcd_eg 0.92
7   2018-08-23 00:15:00 2018-08-23 01:45:00 defg_x2 0.27
8   2018-08-23 00:30:00 2018-08-23 01:45:00 defg_x2 0.27
9   2018-08-23 00:45:00 2018-08-23 01:45:00 defg_x2 0.27
10  2018-08-23 01:00:00 2018-08-23 01:45:00 defg_x2 0.27
11  2018-08-23 01:15:00 2018-08-23 01:45:00 defg_x2 0.27
12  2018-08-23 01:30:00 2018-08-23 01:45:00 defg_x2 0.27
13  2018-08-23 01:45:00 2018-08-23 01:45:00 defg_x2 0.27

Much better than mine. +1 from me. – QuantStats Oct 04 '19 at 06:02 — QuantStats, Oct 04 '19 at 06:02

QuantStats · Answer 2 · 2019-10-03T11:24:06.303

Edit: I see that you had added new info. Maybe this is what you are looking for, if the number of 15-min intervals is fixed, then you can try this. Edit2: Now, it also works with non-fixed 15-min intervals.

import pandas as pd

gap = '15min'

date_start = ['2018-08-23 00:00:00','2018-08-23 00:15:00','2018-08-24 00:45:00', '2018-08-24 00:30:00']

date_end = ['2018-08-23 01:30:00','2018-08-23 01:45:00','2018-08-24 01:00:00','2018-08-24 02:45:00']

count = 0
to_repeat = []

data = {'start_time':date_start,'end_time':date_end,'key':['abcd_eg','defg_x2', 'whef_98','tuyr_23'],'vol':[0.92,0.27,0.87,0.90]}

df = pd.DataFrame(data)

for _ in zip(date_start, date_end):

  temp = pd.date_range(_[0], _[1], freq=gap)
  to_repeat.append(len(temp))

  if count==0:
    ind = temp

  else:
    ind = ind.append(temp)

  count+=1

df_final = df.reindex(df.index.repeat(to_repeat))

df_final['start_time'] = ind

df_final.reset_index(inplace=True)
df_final.drop(columns='index',inplace=True)

print(df_final)

Output

            start_time             end_time      key   vol
0  2018-08-23 00:00:00  2018-08-23 01:30:00  abcd_eg  0.92
1  2018-08-23 00:15:00  2018-08-23 01:30:00  abcd_eg  0.92
2  2018-08-23 00:30:00  2018-08-23 01:30:00  abcd_eg  0.92
3  2018-08-23 00:45:00  2018-08-23 01:30:00  abcd_eg  0.92
4  2018-08-23 01:00:00  2018-08-23 01:30:00  abcd_eg  0.92
5  2018-08-23 01:15:00  2018-08-23 01:30:00  abcd_eg  0.92
6  2018-08-23 01:30:00  2018-08-23 01:30:00  abcd_eg  0.92
7  2018-08-23 00:15:00  2018-08-23 01:45:00  defg_x2  0.27
8  2018-08-23 00:30:00  2018-08-23 01:45:00  defg_x2  0.27
9  2018-08-23 00:45:00  2018-08-23 01:45:00  defg_x2  0.27
10 2018-08-23 01:00:00  2018-08-23 01:45:00  defg_x2  0.27
11 2018-08-23 01:15:00  2018-08-23 01:45:00  defg_x2  0.27
12 2018-08-23 01:30:00  2018-08-23 01:45:00  defg_x2  0.27
13 2018-08-23 01:45:00  2018-08-23 01:45:00  defg_x2  0.27
14 2018-08-24 00:45:00  2018-08-24 01:00:00  whef_98  0.87
15 2018-08-24 01:00:00  2018-08-24 01:00:00  whef_98  0.87
16 2018-08-24 00:30:00  2018-08-24 02:45:00  tuyr_23  0.90
17 2018-08-24 00:45:00  2018-08-24 02:45:00  tuyr_23  0.90
18 2018-08-24 01:00:00  2018-08-24 02:45:00  tuyr_23  0.90
19 2018-08-24 01:15:00  2018-08-24 02:45:00  tuyr_23  0.90
20 2018-08-24 01:30:00  2018-08-24 02:45:00  tuyr_23  0.90
21 2018-08-24 01:45:00  2018-08-24 02:45:00  tuyr_23  0.90
22 2018-08-24 02:00:00  2018-08-24 02:45:00  tuyr_23  0.90
23 2018-08-24 02:15:00  2018-08-24 02:45:00  tuyr_23  0.90
24 2018-08-24 02:30:00  2018-08-24 02:45:00  tuyr_23  0.90
25 2018-08-24 02:45:00  2018-08-24 02:45:00  tuyr_23  0.90

The problem that I am facing is that the dataframe `df` has about 125 records with different `key` and `vol`. I just took the first two rows as an example. But there are records that can have 21-30 fifteen minute intervals between `start_time` and `end_time`. So in that case `to_repeat` will be changing. Could you please suggest ways to use your solution over the entire dataframe at once? The index isn't an issue though. — akshay_rao_19, Oct 03 '19 at 11:09
Yes, have a look at the new solution with dynamic to_repeat. — QuantStats, Oct 03 '19 at 11:16
@akshay_rao_19 Surely. I assume that you know how the df I use looks like for this example. — QuantStats, Oct 03 '19 at 13:46
@akshay_rao_19 I will explain the loop implementation. First use zip to store a list of pairwise `start_time` and `end_time` elements. For every pair, generate a temporary vector of time elements that are 15mins apart and store it in `ind`, and at the same time calculate the length of that temporary vector, and store it in `to_repeat`. `to_repeat` gives information on how many times each row of to repeat. Perform the repeat and reindex to create a new df (reindex is for style only). Replace the 'start_time' with `ind` in new df, and you are done. Reset index, drop are for style formatting. — QuantStats, Oct 03 '19 at 14:02
Oh great! Makes sense now. I cannot upvote your answer yet as I do not have enough rep. Will do that once I have enough. — akshay_rao_19, Oct 03 '19 at 17:10

Pandas: Generating datetime between two times and pass the result as new rows in the data frame

2 Answers2