I have the following Pandas dataframe:
shop | item | price | start_date_valid | end_date_valid |
---|---|---|---|---|
shop1 | item1 | 100.00 | 2019-01-01 | 2019-01-06 |
shop1 | item1 | 120.00 | 2019-01-07 | 2019-01-10 |
shop1 | item2 | 80.00 | 2018-12-31 | 2019-01-03 |
shop1 | item2 | 150.00 | 2019-01-04 | 2019-01-06 |
shop2 | item1 | 105.00 | 2019-01-01 | 2019-01-06 |
shop2 | item1 | 130.00 | 2019-01-07 | 2019-01-10 |
shop2 | item2 | 85.00 | 2018-12-31 | 2019-01-03 |
shop2 | item2 | 145.00 | 2019-01-04 | 2019-01-06 |
I need to get the following table as output:
shop | item | date | price |
---|---|---|---|
shop1 | item1 | 2019-01-01 | 100.00 |
shop1 | item1 | 2019-01-02 | 100.00 |
shop1 | item1 | 2019-01-03 | 100.00 |
shop1 | item1 | 2019-01-04 | 100.00 |
shop1 | item1 | 2019-01-05 | 100.00 |
shop1 | item1 | 2019-01-06 | 100.00 |
shop1 | item1 | 2019-01-07 | 120.00 |
shop1 | item1 | 2019-01-08 | 120.00 |
shop1 | item1 | 2019-01-09 | 120.00 |
shop1 | item1 | 2019-01-10 | 120.00 |
shop1 | item2 | 2018-12-31 | 80.00 |
shop1 | item2 | 2019-01-01 | 80.00 |
shop1 | item2 | 2019-01-02 | 80.00 |
shop1 | item2 | 2019-01-03 | 80.00 |
shop1 | item2 | 2019-01-04 | 80.00 |
shop1 | item2 | 2019-01-05 | 80.00 |
shop1 | item2 | 2019-01-06 | 80.00 |
for all shop/item combinations. The problem is that the source dataframe is very big (about 100 million rows), so simple df.groupby(['shop', 'item']).apply(...)
looks unsuitable to me.
How can I produce the desired output in the fastest way possible?