I have a pandas.DataFrame
containing start
and end
columns, plus a couple of additional columns. I would like to expand this dataframe into a time series that starts at start
values and end at end
values, but copying my other columns. So far I came up with the following:
import pandas as pd
import datetime as dt
df = pd.DataFrame()
df['start'] = [dt.datetime(2017, 4, 3), dt.datetime(2017, 4, 5), dt.datetime(2017, 4, 10)]
df['end'] = [dt.datetime(2017, 4, 10), dt.datetime(2017, 4, 12), dt.datetime(2017, 4, 17)]
df['country'] = ['US', 'EU', 'UK']
df['letter'] = ['a', 'b', 'c']
data_series = list()
for row in df.itertuples():
time_range = pd.bdate_range(row.start, row.end)
s = len(time_range)
data_series += (zip(time_range, [row.start]*s, [row.end]*s, [row.country]*s, [row.letter]*s))
columns_names = ['date', 'start', 'end', 'country', 'letter']
df = pd.DataFrame(data_series, columns=columns_names)
Starting Dataframe:
start end country letter
0 2017-04-03 2017-04-10 US a
1 2017-04-05 2017-04-12 EU b
2 2017-04-10 2017-04-17 UK c
Desired output:
date start end country letter
0 2017-04-03 2017-04-03 2017-04-10 US a
1 2017-04-04 2017-04-03 2017-04-10 US a
2 2017-04-05 2017-04-03 2017-04-10 US a
3 2017-04-06 2017-04-03 2017-04-10 US a
4 2017-04-07 2017-04-03 2017-04-10 US a
5 2017-04-10 2017-04-03 2017-04-10 US a
6 2017-04-05 2017-04-05 2017-04-12 EU b
7 2017-04-06 2017-04-05 2017-04-12 EU b
8 2017-04-07 2017-04-05 2017-04-12 EU b
9 2017-04-10 2017-04-05 2017-04-12 EU b
10 2017-04-11 2017-04-05 2017-04-12 EU b
11 2017-04-12 2017-04-05 2017-04-12 EU b
12 2017-04-10 2017-04-10 2017-04-17 UK c
13 2017-04-11 2017-04-10 2017-04-17 UK c
14 2017-04-12 2017-04-10 2017-04-17 UK c
15 2017-04-13 2017-04-10 2017-04-17 UK c
16 2017-04-14 2017-04-10 2017-04-17 UK c
17 2017-04-17 2017-04-10 2017-04-17 UK c
Problem with my solution is that when applying it to a much bigger dataframe (mostly in terms of rows), it does not achieve a result fast enough for me. Does anybody have any ideas of how I could improve? I am also considering solutions in numpy.