I have the following data frame and want to:
- Group records by
month
- Sum
QTY_SOLD
andNET_AMT
of each uniqueUPC_ID
(per month) - Include the rest of the columns as well in the resulting dataframe
The way I thought I can do this is 1st: create a month
column to aggregate the D_DATES
, then sum QTY_SOLD
by UPC_ID
.
Script:
# Convert date to date time object
df['D_DATE'] = pd.to_datetime(df['D_DATE'])
# Create aggregated months column
df['month'] = df['D_DATE'].apply(dt.date.strftime, args=('%Y.%m',))
# Group by month and sum up quantity sold by UPC_ID
df = df.groupby(['month', 'UPC_ID'])['QTY_SOLD'].sum()
Current data frame:
UPC_ID | UPC_DSC | D_DATE | QTY_SOLD | NET_AMT
----------------------------------------------
111 desc1 2/26/2017 2 10 (2 x $5)
222 desc2 2/26/2017 3 15
333 desc3 2/26/2017 1 4
111 desc1 3/1/2017 1 5
111 desc1 3/3/2017 4 20
Desired Output:
MONTH | UPC_ID | QTY_SOLD | NET_AMT | UPC_DSC
----------------------------------------------
2017-2 111 2 10 etc...
2017-2 222 3 15
2017-2 333 1 4
2017-3 111 5 25
Actual Output:
MONTH | UPC_ID
----------------------------------------------
2017-2 111 2
222 3
333 1
2017-3 111 5
...
Questions:
- How do I include the month for each row?
- How do I include the rest of the columns of the dataframe?
- How do also sum
NET_AMT
in addition toQTY_SOLD
?