I have transaction level data with some reversal transactions. These transactions are denoted by a negative amount and then a counterpart positive.
trnx_df <- data.frame(Date = c("2018-01-01", "2018-01-01", "2018-01-01", "2018-01-01", "2018-01-03", "2018-01-03", "2018-01-05", "2018-02-01",
"2018-02-01", "2018-02-01"),
Product = c("A", "A", "A", "A", "B", "B", "B", "A", "A", "A"),
Amount = c(-1000, 1000, 1000, 1000, -1000, 1000, 500, -2000, 1000, 2000))
trnx_df
Date Product Amount
1 2018-01-01 A -1000
2 2018-01-01 A 1000
3 2018-01-01 A 1000
4 2018-01-01 A 1000
5 2018-01-03 B -1000
6 2018-01-03 B 1000
7 2018-01-05 B 500
8 2018-02-01 A -2000
9 2018-02-01 A 1000
10 2018-02-01 A 2000
I want to arrive at total amount and maximum amount spent by that customer on particular product.
By using dplyr I arrive at:
library(dplyr)
trnx_summary <- trnx_df %>%
group_by(Product) %>%
summarize(Total_amount = sum(Amount),
Max_amount = max(Amount))
trnx_summary
Product Total_amount Max_amount
1 A 3000 2000
2 B 500 1000
For total there will be no problem as the negative entry will cancel out positive one but for maximum amount spent I will get wrong output.
The maximum amount for Product A should be 1000 (2000
and -2000
will cancel each other out).
How can I fix this? Also, is there a way to delete these reversal transactions from the dataframe itself?