I have a data table containing information on the volumes for a varying product mix sold to different customers over several years.
The code for this Data structure is -
Data <- data.table(Region="USA", Customer=c("ABC","ABC","ABC","DEF","DEF","DEF","ABC","ABC","ABC","XYZ","XYZ"), ProductName=c("Radio123","Radio234","Radio345","Radio123","Radio234","Radio345","BB123","BB234","BB345","Radio123","BB123"),CustomerID=c("123","123","123","125","125","125","123","123","123","124","124"),'2015'=c(0,123,0,0,0,0,4,0,0,0,0),'2016'=c(0,0,0,0,0,0,10093,0,0,234,0),Class=c("R1","R1","R1","R1","R1","R1","R2","R2","R2","R1","R1"))
For the sake of simplicity, I have only kept 2015 and 2016. As you can see, there are several different types (6) of products listed under the Product Name and the corresponding volumes sold are listed under the Year Number. There is a key to differentiate the customers and that is the Customer ID. You can also see under the Class that the different products essentially fall into two categories, i.e., R1 and R2.
What I would like to do is to aggregate the data on the Customer ID so I can get a total of the volume sold each year to every customer, i.e., have one row only for each customer.
This i can do with the following -
Data[, lapply(.SD, sum, na.rm = TRUE), .SDcols = c("2015", "2016"), by = c("Region", "Customer ID", "Customer")]
However, instead of a simple aggregate, I would like to pick a combination of 3R1+1R2 type and consider that as one unit of volume sold. Eg: If I aggregate for Customer ID = 123, I will get 127 (123 Radio234 + 4 BB123) under column 2015 and 10093 (BB123) under column 2016 etc. Instead I would like to show 1 unit as 3 Radio234 + 1 BB123 of complete pairs (4 under 2015 in this example) and in another column show the remaining incomplete pairs (111 Radio234 under 2015 in this example).
Any idea how this can be achieved in R? Should I use a simple divide function to get a count of complete groups in each Class and then a min to get the value of complete pairs?
I am unsure of how to program it but this is the logical I am thinking of -
1) Aggregate Class based on Customer ID which gives exactly two rows per Customer, i.e., 1 for volumes sold of type R1 and one for R2.
2) Divide R1 by 3 to get a count of complete sets that can be formed using R1 and then do a min(R1,R2) to see how many sets we can make. Eg: For 2015, we have 123/3 giving us a dividend of 41 in case of R1 and we see we have 4 R2. Then we do a min(41,4) which gives us the value of complete sets we can make using the formula (3R1+R2), thus giving us 4. We are left with 123-(3x4) = 111 R1 Class products or 37 incomplete pairs (missing R2).
Possible Output -
Data_Agg <- data.table(Region="USA",
Customer=c("ABC","DEF","XYZ"),
CustomerID=c("123","125","124"),
'2015_complete'= c(4,0,0),
'2016_complete'=c(0,0,0),
'2015_incomplete'=c("37_R1",0,0),
'2016_incomplete'=c("10093_R2",0,"78_R1"))
> Data_Agg
# Region Customer CustomerID 2015_complete 2016_complete 2015_incomplete 2016_incomplete
#1: USA ABC 123 4 0 37_R1 10093_R2
#2: USA DEF 125 0 0 0 0
#3: USA XYZ 124 0 0 0 78_R1
Or is there a more efficient way to do this? As an extension, I would also like to create an array or key value pair where I can store the Product Name composition of each row aggregate (how many Radio123 or Radio234 or BB123 make up a particular aggregated row).
Thanks for the help!