Data Aggregation in Excel/Python/R

Question

I have data in the following format:

Date(dmy)  | Area   | Item_ID | Quantity
01-04-2016 | 201002 | AX      | 10
01-04-2016 | 560001 | ML      | 7
05-04-2016 | 201002 | AX      | 3
10-04-2016 | 201002 | WAP     | 16
30-04-2016 | 560001 | XY      | 9
07-05-2016 | 560001 | ML      | 20
10-05-2016 | 201002 | AX      | 1
17-05-2016 | 560001 | AX      | 2
31-05-2016 | 201002 | AX      | 10
.
.
.

I need to do Quantity aggregation on Item_IDs and Area level on a monthly level. i.e. for the above data, required aggregated Output should look like:

Date(monthyear)  | Area   | Item_ID | Quantity
04/2016          | 201002 | AX      | 13
04/2016          | 560001 | ML      | 7
04/2016          | 201002 | WAP     | 16
04/2016          | 560001 | XY      | 9
05/2016          | 560001 | ML      | 20
05/2016          | 201002 | AX      | 11
05/2016          | 560001 | AX      | 2

Intuition is: If (Area and Item_ID pair) is same in multiple rows of the same month, perform Quantity aggregation(sum).

Please help me perform this aggregation to get the result in this format itself. Thanks.

library(dplyr); df %>% group_by(month(Date), Area, Item_ID) %>% summarize(quantity=sum(quantity)) — Bill O'Brien, Jul 31 '20 at 18:41
Thank you all for the answers. :) Pivot table just worked as per my requirement. I have also performed the same thing in python as well using: data.groupby(['Year','Month','Area','Item_ID']).sum()[['Quantity']]. — Annu, Jul 31 '20 at 19:32

score 1 · Answer 1 · answered Jul 31 '20 at 18:53

Try this base R solution:

#Data
df <- structure(list(Date.dmy. = c("01/04/2016", "01/04/2016", "05/04/2016", 
"10/04/2016", "30/04/2016", "07/05/2016", "10/05/2016", "17/05/2016", 
"31/05/2016"), Area = c(201002L, 560001L, 201002L, 201002L, 560001L, 
560001L, 201002L, 560001L, 201002L), Item_ID = c("AX", "ML", 
"AX", "WAP", "XY", "ML", "AX", "AX", "AX"), Quantity = c(10L, 
7L, 3L, 16L, 9L, 20L, 1L, 2L, 10L)), row.names = c(NA, -9L), class = "data.frame")

#Code
#Format data
df$Date <- format(as.Date(df$Date.dmy.,'%d/%m/%Y'),'%m-%Y')
#Aggregate
AG <- aggregate(Quantity~Date+Area+Item_ID,data=df,sum,na.rm=T)

     Date   Area Item_ID Quantity
1 04-2016 201002      AX       13
2 05-2016 201002      AX       11
3 05-2016 560001      AX        2
4 04-2016 560001      ML        7
5 05-2016 560001      ML       20
6 04-2016 201002     WAP       16
7 04-2016 560001      XY        9

Onyambu · Answer 2 · 2020-07-31T19:12:50.963

In R using tidyverse:

library(lubridate)
library(tidyverse)
df %>%
  group_by(grp=format(dmy(`Date(dmy)`),"%m/%Y"),Area,Item_ID)%>%
  summarise(Quantity = sum(Quantity))
# A tibble: 7 x 4
# Groups:   grp, Area [4]
  grp       Area Item_ID Quantity
  <chr>    <int> <chr>      <int>
1 04/2016 201002 AX            13
2 04/2016 201002 WAP           16
3 04/2016 560001 ML             7
4 04/2016 560001 XY             9
5 05/2016 201002 AX            11
6 05/2016 560001 AX             2
7 05/2016 560001 ML            20

if using python, then:

import pandas as pd
df.assign(Date = pd.to_datetime(df.iloc[:,0], dayfirst = True).dt.strftime("%m/%Y")).\
   groupby(['Date','Area','Item_ID']).Quantity.sum().reset_index()

    Date    Area    Item_ID Quantity
0   04/2016 201002  AX  13
1   04/2016 201002  WAP 16
2   04/2016 560001  ML  7
3   04/2016 560001  XY  9
4   05/2016 201002  AX  11
5   05/2016 560001  AX  2
6   05/2016 560001  ML  20

In both of the cases you can order them the way you want

Data Aggregation in Excel/Python/R

2 Answers2