2

I have a log file in following format:

Item Month_end_date old_price new_price row
A 2022-03-31 25 30 1
A 2022-06-30 30 40 2
A 2022-08-31 40 45 3
B 2022-04-30 80 70 4

Here, its assumed that the price of the item A from the start of the year was 25 using 1st row of the above table. I want to get monthly prices using this table. The ideal output looks like the table below:

Item Month_end_date price
A 2022-01-31 25
A 2022-02-28 25
A 2022-03-31 30
A 2022-04-30 30
A 2022-05-31 30
A 2022-06-30 40
A 2022-07-31 40
A 2022-08-31 45
A 2022-09-30 45
A 2022-10-31 45
A 2022-11-30 45
A 2022-12-31 45
B 2022-01-31 80
B 2022-02-28 80
B 2022-03-31 80
B 2022-04-30 70
B 2022-05-31 70
B 2022-06-30 70
B 2022-07-31 70
B 2022-08-31 70
B 2022-09-30 70
B 2022-10-31 70
B 2022-11-30 70
B 2022-12-31 70
  • what have you tried so far? – srinath Jul 29 '22 at 05:48
  • apply shift to the Month_end_date date. And use the following link to generate the intermediate months. https://stackoverflow.com/questions/34898525/generate-list-of-months-between-interval-in-python. Once the rows are generated, query the dataframe for prices – srinath Jul 29 '22 at 06:02

1 Answers1

4

IIUC, you can reshape, fill in the missing periods and ffill/bfill per group:

(df
 .assign(**{'Month_end_date': pd.to_datetime(df['Month_end_date'])})
 .set_index(['Item', 'Month_end_date'])
 [['old_price', 'new_price']]
 .reindex(pd.MultiIndex
            .from_product([df['Item'].unique(),
                           pd.date_range('2022-01-01',
                                         '2022-12-31',
                                         freq='M')],
                          names=['Items', 'Month_end_date'])
         )
 .stack(dropna=False)
 .groupby(level=0).apply(lambda g: g.ffill().bfill())
 .unstack()['new_price']
 .reset_index(name='price')
)

output:

   Items Month_end_date  price
0      A     2022-01-31   25.0
1      A     2022-02-28   25.0
2      A     2022-03-31   30.0
3      A     2022-04-30   30.0
4      A     2022-05-31   30.0
5      A     2022-06-30   40.0
6      A     2022-07-31   40.0
7      A     2022-08-31   45.0
8      A     2022-09-30   45.0
9      A     2022-10-31   45.0
10     A     2022-11-30   45.0
11     A     2022-12-31   45.0
12     B     2022-01-31   80.0
13     B     2022-02-28   80.0
14     B     2022-03-31   80.0
15     B     2022-04-30   70.0
16     B     2022-05-31   70.0
17     B     2022-06-30   70.0
18     B     2022-07-31   70.0
19     B     2022-08-31   70.0
20     B     2022-09-30   70.0
21     B     2022-10-31   70.0
22     B     2022-11-30   70.0
23     B     2022-12-31   70.0
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Nice solution! Note that `Month_end_date` starts at `yyyy-01-31` and `price` should equal `30.0` for `Month_end_date=2022-03-31` – ko3 Jul 29 '22 at 06:17
  • 1
    @ko3 well spotted, then I guess the start date in `date_range` should be the start of the year (either hardcoded or inferred from the min date). – mozway Jul 29 '22 at 07:23
  • @ko3 I edited, minor change on the selected column, this should be correct now (I didn't check everything though) – mozway Jul 29 '22 at 07:34
  • Now, it perfectly matches the desired output (y) – ko3 Jul 29 '22 at 07:39