-1

I have data in the following format:

Date(dmy)  | Area   | Item_ID | Quantity
01-04-2016 | 201002 | AX      | 10
01-04-2016 | 560001 | ML      | 7
05-04-2016 | 201002 | AX      | 3
10-04-2016 | 201002 | WAP     | 16
30-04-2016 | 560001 | XY      | 9
07-05-2016 | 560001 | ML      | 20
10-05-2016 | 201002 | AX      | 1
17-05-2016 | 560001 | AX      | 2
31-05-2016 | 201002 | AX      | 10
.
.
.

I need to do Quantity aggregation on Item_IDs and Area level on a monthly level. i.e. for the above data, required aggregated Output should look like:

Date(monthyear)  | Area   | Item_ID | Quantity
04/2016          | 201002 | AX      | 13
04/2016          | 560001 | ML      | 7
04/2016          | 201002 | WAP     | 16
04/2016          | 560001 | XY      | 9
05/2016          | 560001 | ML      | 20
05/2016          | 201002 | AX      | 11
05/2016          | 560001 | AX      | 2

Intuition is: If (Area and Item_ID pair) is same in multiple rows of the same month, perform Quantity aggregation(sum).

Please help me perform this aggregation to get the result in this format itself. Thanks.

Annu
  • 13
  • 2
  • 1
    use a pivot table – Scott Craner Jul 31 '20 at 18:40
  • library(dplyr); df %>% group_by(month(Date), Area, Item_ID) %>% summarize(quantity=sum(quantity)) – Bill O'Brien Jul 31 '20 at 18:41
  • Thank you all for the answers. :) Pivot table just worked as per my requirement. I have also performed the same thing in python as well using: data.groupby(['Year','Month','Area','Item_ID']).sum()[['Quantity']]. – Annu Jul 31 '20 at 19:32

2 Answers2

1

Try this base R solution:

#Data
df <- structure(list(Date.dmy. = c("01/04/2016", "01/04/2016", "05/04/2016", 
"10/04/2016", "30/04/2016", "07/05/2016", "10/05/2016", "17/05/2016", 
"31/05/2016"), Area = c(201002L, 560001L, 201002L, 201002L, 560001L, 
560001L, 201002L, 560001L, 201002L), Item_ID = c("AX", "ML", 
"AX", "WAP", "XY", "ML", "AX", "AX", "AX"), Quantity = c(10L, 
7L, 3L, 16L, 9L, 20L, 1L, 2L, 10L)), row.names = c(NA, -9L), class = "data.frame")

#Code
#Format data
df$Date <- format(as.Date(df$Date.dmy.,'%d/%m/%Y'),'%m-%Y')
#Aggregate
AG <- aggregate(Quantity~Date+Area+Item_ID,data=df,sum,na.rm=T)

     Date   Area Item_ID Quantity
1 04-2016 201002      AX       13
2 05-2016 201002      AX       11
3 05-2016 560001      AX        2
4 04-2016 560001      ML        7
5 05-2016 560001      ML       20
6 04-2016 201002     WAP       16
7 04-2016 560001      XY        9
Duck
  • 39,058
  • 13
  • 42
  • 84
0

In R using tidyverse:

library(lubridate)
library(tidyverse)
df %>%
  group_by(grp=format(dmy(`Date(dmy)`),"%m/%Y"),Area,Item_ID)%>%
  summarise(Quantity = sum(Quantity))
# A tibble: 7 x 4
# Groups:   grp, Area [4]
  grp       Area Item_ID Quantity
  <chr>    <int> <chr>      <int>
1 04/2016 201002 AX            13
2 04/2016 201002 WAP           16
3 04/2016 560001 ML             7
4 04/2016 560001 XY             9
5 05/2016 201002 AX            11
6 05/2016 560001 AX             2
7 05/2016 560001 ML            20

if using python, then:

import pandas as pd
df.assign(Date = pd.to_datetime(df.iloc[:,0], dayfirst = True).dt.strftime("%m/%Y")).\
   groupby(['Date','Area','Item_ID']).Quantity.sum().reset_index()

    Date    Area    Item_ID Quantity
0   04/2016 201002  AX  13
1   04/2016 201002  WAP 16
2   04/2016 560001  ML  7
3   04/2016 560001  XY  9
4   05/2016 201002  AX  11
5   05/2016 560001  AX  2
6   05/2016 560001  ML  20

In both of the cases you can order them the way you want

Onyambu
  • 67,392
  • 3
  • 24
  • 53