0

I am working with UPC (product#), date_expected, and quantity_picked columns and need my data organized to show the total quantity_picked per day (for every day) for each UPC. Example data below:

                 UPC  quantity_picked       date_expected
0      0001111041660              1.0 2019-05-14 15:00:00
1      0001111045045              1.0 2019-05-14 15:00:00
2      0001111050268              1.0 2019-05-14 15:00:00
3      0001111086132              1.0 2019-05-14 15:00:00
4      0001111086983              1.0 2019-05-14 15:00:00
5      0001111086984              1.0 2019-05-14 15:00:00
             ...              ...                 ...
39694  0004470036000              6.0 2019-06-24 20:00:00
39695  0007225001116              1.0 2019-06-24 20:00:00

I was able to successfully organize my data in this manner using the code below, but the output leaves out dates with quantity_picked=0

orders = pd.read_sql_query(SQL, con=sql_conn)
order_daily = orders.copy()
order_daily['date_expected'] = order_daily['date_expected'].dt.normalize()
order_daily['date_expected'] = pd.to_datetime(order_daily.date_expected, format='%Y-%m-%d')

# Groups by date and UPC getting the sum of quanitity picked for each
# then resets index to fill in dates for all rows
tipd = order_daily.groupby(['UPC', 'date_expected']).sum().reset_index()
# Rearranging of columns to put UPC column first
tipd = tipd[['UPC','date_expected','quantity_picked']]

gives the following output:

                 UPC date_expected  quantity_picked
0      0000000002554    2019-05-21              4.0
1      0000000002554    2019-05-24              2.0
2      0000000002554    2019-06-02              2.0
3      0000000002554    2019-06-17              2.0
4      0000000003082    2019-05-15              2.0
5      0000000003082    2019-05-16              2.0
6      0000000003082    2019-05-17              8.0
             ...           ...              ...
31588  0360600051715    2019-06-17              1.0
31589  0501072452748    2019-06-15              1.0
31590  0880100551750    2019-06-07              2.0

When I try to follow the solution given in: Pandas filling missing dates and values within group I adjust my code to

tipd = order_daily.groupby(['UPC', 'date_expected']).sum().reindex(idx, fill_value=0).reset_index()
# Rearranging of columns to put UPC column first
tipd = tipd[['UPC','date_expected','quantity_picked']]
# Viewing first 10 rows to check format of dataframe
print('Preview of Total per Item per Day')
print(tipd.iloc[0:10])

And receive the following error:

TypeError: Argument 'tuples' has incorrect type (expected numpy.ndarray, got DatetimeArray)

I need each date to be listed for each product, even when quantity picked is zero. I plan on creating two new columns using .shift and .diff for calculations, and those columns will not be accurate if my data is skipping dates.

Any guidance is very much appreciated.

Emily Reed
  • 65
  • 10
  • What is `idx` in your code? – Peter Leimbigler Jul 01 '19 at 19:40
  • Also, it would really help if you could post some example data as text (since nobody else has your SQL database), along with corresponding expected output. – Peter Leimbigler Jul 01 '19 at 19:42
  • @PeterLeimbigler I have added the output that I get when I do print(orders), this is what the data looks like both in SQL and once it is queried over. I have realized I should have defined ```idx``` with a date range but I want it to automatically be able to bring over data associated with new dates and I am unsure how to define idx to automatically go from 2019-05-14 to the latest date available. – Emily Reed Jul 01 '19 at 19:49
  • @Parfait SQL Server. – Emily Reed Jul 01 '19 at 19:54
  • @Parfait I do not believe so. When we carry out that query we get repeats of the same UPC on the same date from multiple orders, and we do not have every UPC listed for every date with quantity_picked = 0. – Emily Reed Jul 01 '19 at 20:54
  • I have made much more progress in SQL and will adjust my query and examine the data closer to verify, and then will come back and update this post tomorrow. – Emily Reed Jul 01 '19 at 20:56

0 Answers0