3

I have a dataset that looks like this:

enter image description here

df = pd.DataFrame({
    'weekstart':['01-Jan-18','08-Jan-18','15-Jan-18','22-Jan-18'],
    'weekend':['07-Jan-18','14-Jan-18','21-Jan-18','28-Jan-18'],
    'Spend':[34235.922,37359.6048,38916.1164,36903.8628],
    'Daily':[4890.846,5337.086,5559.445,5271.98],
})

I would like to expand this out to take the daily value and create a new dataset with daily dates to form something like this:

enter image description here

Thank you!!

2 Answers2

1
  1. Convert to datetime
  2. Use pd.date_range to create a column of all dates between "weekstart" and "weekend"
  3. Use explode to split into individual rows.
df["weekstart"] = pd.to_datetime(df["weekstart"])
df["weekend"] = pd.to_datetime(df["weekend"])

df["days"] = df.apply(lambda x: pd.date_range(x["weekstart"], x["weekend"], freq="D"), axis=1)
df = df.explode("days")
output = df[["days", "Daily"]]

>>> output

        days     Daily
0 2018-01-01  4890.846
0 2018-01-02  4890.846
0 2018-01-03  4890.846
0 2018-01-04  4890.846
0 2018-01-05  4890.846
0 2018-01-06  4890.846
0 2018-01-07  4890.846
1 2018-01-08  5337.086
1 2018-01-09  5337.086
1 2018-01-10  5337.086
1 2018-01-11  5337.086
1 2018-01-12  5337.086
1 2018-01-13  5337.086
1 2018-01-14  5337.086
2 2018-01-15  5559.445
2 2018-01-16  5559.445
2 2018-01-17  5559.445
2 2018-01-18  5559.445
2 2018-01-19  5559.445
2 2018-01-20  5559.445
2 2018-01-21  5559.445
3 2018-01-22  5271.980
3 2018-01-23  5271.980
3 2018-01-24  5271.980
3 2018-01-25  5271.980
3 2018-01-26  5271.980
3 2018-01-27  5271.980
3 2018-01-28  5271.980
not_speshal
  • 22,093
  • 2
  • 15
  • 30
  • i keep getting an error when trying to run this solution: ValueError: Length of values does not match length of index – Himansu Odedra Nov 08 '21 at 16:47
  • This works on the sample you provided. Find where your error comes from and include a minimum reproducible example in your question that causes this error. – not_speshal Nov 08 '21 at 19:41
  • I think the problem I am having is - the weekstart/ weekend isn't in datetime format just dtype = 'o'. This example dataset is ^ but my actual csv I am importing isn't. Could you help with this please – Himansu Odedra Nov 08 '21 at 20:26
0

It seems like an easy problem. Iterate the existing dataframe using df.iterrows() and then get the current date range then use this range to iterate a range of dates and use these dates to populate another data frame with the detail dates and amounts. I iteriate the dataframe because I am expanding the data into a new dataframe rather than added a new column using a dataframe explode

df  = pd.DataFrame({
'weekstart':['01-Jan-18','08-Jan-18','15-Jan-18','22-Jan-18'],
'weekend':['07-Jan-18','14-Jan-18','21-Jan-18','28-Jan-18'],
'Spend':[34235.922,37359.6048,38916.1164,36903.8628],
'Daily':[4890.846,5337.086,5559.445,5271.98],
})

#for key, value in df.iterrows():
#   fill a range with weekstart and weekend date ranges:
#   iterate the range and populate the new data frame 

df2=pd.DataFrame(columns=['Date','Amount'])
df2['Date']=pd.to_datetime(df2['Date'])

for key,value in df.iterrows():
    amount=value['Daily']
    for date in pd.date_range(value['weekstart'], value['weekend']):
        df2=df2.append({"Date":date, "Amount":amount},ignore_index=True)

index=range(1,len(df2)+1) 
df2.set_index(pd.Index(index),'index',inplace=True)

plt.plot(df2['Date'],df2['Amount'])
plt.xticks(rotation=90)
plt.show()

print(df2)

output:

Date    Amount
1  2018-01-01  4890.846
2  2018-01-02  4890.846
3  2018-01-03  4890.846
4  2018-01-04  4890.846
5  2018-01-05  4890.846
6  2018-01-06  4890.846
7  2018-01-07  4890.846
8  2018-01-08  5337.086
9  2018-01-09  5337.086
10 2018-01-10  5337.086
11 2018-01-11  5337.086
12 2018-01-12  5337.086
13 2018-01-13  5337.086
14 2018-01-14  5337.086
15 2018-01-15  5559.445
16 2018-01-16  5559.445
17 2018-01-17  5559.445
18 2018-01-18  5559.445
19 2018-01-19  5559.445
20 2018-01-20  5559.445
21 2018-01-21  5559.445
22 2018-01-22  5271.980
23 2018-01-23  5271.980
24 2018-01-24  5271.980
25 2018-01-25  5271.980
26 2018-01-26  5271.980
27 2018-01-27  5271.980
28 2018-01-28  5271.980
Golden Lion
  • 3,840
  • 2
  • 26
  • 35
  • You [shouldn't iterate over dataframes](https://stackoverflow.com/a/55557758/9857631) when there are other options. – not_speshal Nov 08 '21 at 17:05