I am working with UPC (product#), date_expected, and quantity_picked columns and need my data organized to show the total quantity_picked per day (for every day) for each UPC. Example data below:
UPC quantity_picked date_expected
0 0001111041660 1.0 2019-05-14 15:00:00
1 0001111045045 1.0 2019-05-14 15:00:00
2 0001111050268 1.0 2019-05-14 15:00:00
3 0001111086132 1.0 2019-05-14 15:00:00
4 0001111086983 1.0 2019-05-14 15:00:00
5 0001111086984 1.0 2019-05-14 15:00:00
... ... ...
39694 0004470036000 6.0 2019-06-24 20:00:00
39695 0007225001116 1.0 2019-06-24 20:00:00
I was able to successfully organize my data in this manner using the code below, but the output leaves out dates with quantity_picked=0
orders = pd.read_sql_query(SQL, con=sql_conn)
order_daily = orders.copy()
order_daily['date_expected'] = order_daily['date_expected'].dt.normalize()
order_daily['date_expected'] = pd.to_datetime(order_daily.date_expected, format='%Y-%m-%d')
# Groups by date and UPC getting the sum of quanitity picked for each
# then resets index to fill in dates for all rows
tipd = order_daily.groupby(['UPC', 'date_expected']).sum().reset_index()
# Rearranging of columns to put UPC column first
tipd = tipd[['UPC','date_expected','quantity_picked']]
gives the following output:
UPC date_expected quantity_picked
0 0000000002554 2019-05-21 4.0
1 0000000002554 2019-05-24 2.0
2 0000000002554 2019-06-02 2.0
3 0000000002554 2019-06-17 2.0
4 0000000003082 2019-05-15 2.0
5 0000000003082 2019-05-16 2.0
6 0000000003082 2019-05-17 8.0
... ... ...
31588 0360600051715 2019-06-17 1.0
31589 0501072452748 2019-06-15 1.0
31590 0880100551750 2019-06-07 2.0
When I try to follow the solution given in: Pandas filling missing dates and values within group I adjust my code to
tipd = order_daily.groupby(['UPC', 'date_expected']).sum().reindex(idx, fill_value=0).reset_index()
# Rearranging of columns to put UPC column first
tipd = tipd[['UPC','date_expected','quantity_picked']]
# Viewing first 10 rows to check format of dataframe
print('Preview of Total per Item per Day')
print(tipd.iloc[0:10])
And receive the following error:
TypeError: Argument 'tuples' has incorrect type (expected numpy.ndarray, got DatetimeArray)
I need each date to be listed for each product, even when quantity picked is zero. I plan on creating two new columns using .shift and .diff for calculations, and those columns will not be accurate if my data is skipping dates.
Any guidance is very much appreciated.