0

I have a dataset below where I need to subtract the premium to refund (Premium - Refund). From 2 duplicate (Product_Code) rows. I will get the premium from the row that has "New" Status and then the Refund will come from "Canceled" status. Please see my dataset below.

Product_Code <- c("1A","1B","1D","1A","1C","1D","1F","1G","1B","1H")
Status <- c("New", "New","New","Canceled","New","Canceled","New","New",
            "Canceled", "New")
Premium <- c(1200,1500,2000,0,1000,0,1400,1600,0,1300)
Refund <- c(0,0,0,800,0,1500,0,0,900,0)
DataSet <- data.frame(Product_Code, Status, Premium, Refund).
> DataSet
   Product_Code   Status Premium Refund
1            1A      New    1200      0
2            1B      New    1500      0
3            1D      New    2000      0
4            1A Canceled       0    800
5            1C      New    1000      0
6            1D Canceled       0   1500
7            1F      New    1400      0
8            1G      New    1600      0
9            1B Canceled       0    900
10           1H      New    1300      0

My desired outcome will be is to create a New Dataset. Where if the Product_Code has been duplicated (There is New and Canceled Status) the Premium from "New" Status will be subtracted by the Refund from the "Canceled" Status. And then collapsing to just one Product_Code removing the Canceled row. The new Premium will be Premium (from New Status) - Refund (from Canceled status). Please see desired output below.

> DataSet
  Product_Code Status Premium Refund
1           1A    New     400      0
2           1B    New     600      0
3           1D    New     500      0
4           1C    New    1000      0
5           1F    New    1400      0
6           1G    New    1600      0
7           1H    New    1300      0
Bustergun
  • 977
  • 3
  • 11
  • 17

1 Answers1

1

Using dplyr, we can sum Premium and Refund column and subtract it from each other.

library(dplyr)

DataSet %>%
  group_by(Product_Code) %>%
  summarise(Status = "New", 
            Premium = sum(Premium) - sum(Refund), 
            Refund = 0)

# A tibble: 7 x 4
#  Product_Code Status Premium Refund
#  <fct>        <chr>    <dbl>  <dbl>
#1 1A           New        400      0
#2 1B           New        600      0
#3 1C           New       1000      0
#4 1D           New        500      0
#5 1F           New       1400      0
#6 1G           New       1600      0
#7 1H           New       1300      0

We can use the same logic in data.table as well

library(data.table)

setDT(DataSet)
DataSet[,.(Premium = sum(Premium) - sum(Refund), Status = "New", 
           Refund = 0), Product_Code]

and in base R using aggregate

transform(aggregate(Premium~Product_Code, transform(DataSet, 
          Premium = Premium - Refund), sum), Status = "New", Refund = 0)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213