How to transform many_days to daily in python?

Question

I have the following dataframe:

import pandas as pd
    dt = pd.DataFrame({'start_date': ['2019-05-20', '2019-05-21', '2019-05-21'],
                       'end_date': ['2019-05-23', '2019-05-24', '2019-05-22'],
                       'reg': ['A', 'B','A'],
                       'measure': [100, 200,1000]})

I would to create a new column, called 'date', which will have values from start_date until end_date and also have a new column measure_daily which will be the measure spread equally among these dates.

So basically, I would like to expand the dt in terms of rows

So I would like the final df to look like:

   dt_f = pd.DataFrame({'date':['2019-05-20','2019-05-21','2019-05-22','2019-05-23','2019-05-21','2019-05-22','2019-05-23','2019-05-24', '2019-05-21','2019-05-22'],
                            'reg':['A','A','A','A','B','B','B','B','A','A'],
                             'measure_daily':[25,25,25,25,50,50,50,50,500,500]})

Is there an efficient way to do this in python ?

https://stackoverflow.com/questions/44572818/convert-weekly-data-into-daily-data-with-python — Xcode beginner, Jul 26 '19 at 10:56
@Xcodebeginner that question has two major differences: it doesnt take into account a date range, and it also doesnt spread the measure equally — quant, Jul 26 '19 at 10:58
But how do you see the fact that your start and enddate are overlapping for the different `reg`? — Erfan, Jul 26 '19 at 10:59
@Erfan then for some rows, the new column `date` would have the same date value, but the `measure` would be different, and also the `reg` would be different — quant, Jul 26 '19 at 11:01
@Erfan I edited the question, hopefully it is more clear now ! — quant, Jul 26 '19 at 11:08
why is there just 4 rows for A, the date range between start and end is 11 days..? — anky, Jul 26 '19 at 11:09
@anky_91 I edited the initial `dt` in order to create a "reproducible end result". Both A & B now have 4 days each — quant, Jul 26 '19 at 11:13

Erfan · Accepted Answer · 2019-07-26T11:55:41.400

TL;DR

just give me the solution:

dt = dt.assign(key=dt.index)
melt = dt.melt(id_vars = ['reg', 'measure', 'key'], value_name='date').drop('variable', axis=1)

melt = pd.concat(
    [d.set_index('date').resample('d').first().ffill() for _, d in melt.groupby(['reg', 'key'], sort=False)]
).reset_index()

melt.assign(measure = melt['measure'].div(melt.groupby(['reg', 'key'], sort=False)['reg'].transform('size'))).drop('key', axis=1)

Breakdown:

First we melt your start and end date to the same column:

dt = dt.assign(key=dt.index)
melt = dt.melt(id_vars = ['reg', 'measure', 'key'], value_name='date').drop('variable', axis=1)

  reg  measure  key       date
0   A      100    0 2019-05-20
1   B      200    1 2019-05-21
2   A     1000    2 2019-05-21
3   A      100    0 2019-05-23
4   B      200    1 2019-05-24
5   A     1000    2 2019-05-22

Then we resample on daily basis while applying groupby to keep the different reg in their own group.

melt = pd.concat(
    [d.set_index('date').resample('d').first().ffill() for _, d in melt.groupby(['reg', 'key'], sort=False)]
).reset_index()

        date reg  measure  key
0 2019-05-20   A    100.0  0.0
1 2019-05-21   A    100.0  0.0
2 2019-05-22   A    100.0  0.0
3 2019-05-23   A    100.0  0.0
4 2019-05-21   B    200.0  1.0
5 2019-05-22   B    200.0  1.0
6 2019-05-23   B    200.0  1.0
7 2019-05-24   B    200.0  1.0
8 2019-05-21   A   1000.0  2.0
9 2019-05-22   A   1000.0  2.0

Finally we spread out the measure column over the size of each group with assign:

melt.assign(measure = melt['measure'].div(melt.groupby(['reg', 'key'], sort=False)['reg'].transform('size'))).drop('key', axis=1)

        date reg  measure
0 2019-05-20   A     25.0
1 2019-05-21   A     25.0
2 2019-05-22   A     25.0
3 2019-05-23   A     25.0
4 2019-05-21   B     50.0
5 2019-05-22   B     50.0
6 2019-05-23   B     50.0
7 2019-05-24   B     50.0
8 2019-05-21   A    500.0
9 2019-05-22   A    500.0

do you think, there is a way to expand this solution, in case you had an overlap in dates but within the same region ? — quant, Jul 26 '19 at 11:30
How would overlapping dates look like within the same region? You mean that somewhere further on the dataset `region A` will show again with overlapping dates? @quant — Erfan, Jul 26 '19 at 11:33
Last question, does it happen that a new `reg` down the dataframe has the same `measure` value as the same reg but then further up the dataframe? @quant — Erfan, Jul 26 '19 at 11:47
Last edit, this should give correct output and should run fast on big data frame — Erfan, Jul 26 '19 at 11:54

How to transform many_days to daily in python?

1 Answers1

TL;DR

Breakdown: