2

For each plant/product, I would like to add a record for each missing date within a date range. The range is based on the min/max dates regardless of Plant/Product. Then, for each Plant/Product, I would like to forward fill the Qty for each new record.

Here's a sample of the my initial pandas dataframe. I'm would like to turn this:

Plant  Product Date         Qty
Austin A       2019-08-30   500
Austin A       2019-09-01   700
Austin A       2019-09-03   600
Austin B       2019-08-31   2000
Austin B       2019-09-02   2100
Austin B       2019-09-04   2200
Boston A       2019-08-31   50
Boston A       2019-09-02   60

Into this:

Plant  Product Date         Qty
Austin A       2019-08-30   500
Austin A       2019-08-31   500
Austin A       2019-09-01   700
Austin A       2019-09-02   700
Austin A       2019-09-03   600
Austin A       2019-09-04   600
Austin B       2019-08-31   2000
Austin B       2019-09-01   2000
Austin B       2019-09-02   2100
Austin B       2019-09-03   2100
Austin B       2019-09-04   2200
Boston A       2019-08-31   50
Boston A       2019-09-01   50
Boston A       2019-09-02   60
Boston A       2019-09-03   60
Boston A       2019-09-04   60

There is a solution in a Stack Overflow post Pandas filling missing dates and values within group, which came close, but only seemed to work when 2 fields (in addition to the date and qty/value) are present. When I tried the following:

ICG=ICG.set_index(
    ['Date', 'Plant', 'Product']).unstack().asfreq('D').stack().sort_index(level=1).reset_index()

I received an error message on the asfreq() function:

TypeError: Cannot convert input [(Timestamp('2019-08-30 00:00:00'), 'Austin')] of type <class 'tuple'> to Timestamp
eyllanesc
  • 235,170
  • 19
  • 170
  • 241

1 Answers1

1

Convert Date column to datetime dtype. The unstack and stack solution could be adapted to double unstack as follows

df['Date'] = pd.to_datetime(df['Date'])
df_filled = (df.set_index(['Date', 'Plant', 'Product'])['Qty']
               .unstack([1,2]).asfreq('D').ffill().bfill()
               .unstack().reset_index(name='Qty'))

Out[50]:
     Plant Product       Date     Qty
0   Austin       A 2019-08-30   500.0
1   Austin       A 2019-08-31   500.0
2   Austin       A 2019-09-01   700.0
3   Austin       A 2019-09-02   700.0
4   Austin       A 2019-09-03   600.0
5   Austin       A 2019-09-04   600.0
6   Austin       B 2019-08-30  2000.0
7   Austin       B 2019-08-31  2000.0
8   Austin       B 2019-09-01  2000.0
9   Austin       B 2019-09-02  2100.0
10  Austin       B 2019-09-03  2100.0
11  Austin       B 2019-09-04  2200.0
12  Boston       A 2019-08-30    50.0
13  Boston       A 2019-08-31    50.0
14  Boston       A 2019-09-01    50.0
15  Boston       A 2019-09-02    60.0
16  Boston       A 2019-09-03    60.0
17  Boston       A 2019-09-04    60.0
Andy L.
  • 24,909
  • 4
  • 17
  • 29