I have a dataframe of over 400,000 rows, ranging from January to December 2020, with the following being a snippet over three months:
tab <- structure(list(Date = structure(c(1580947200, 1581033600, 1581120000,
1581206400, 1581292800, 1581379200, 1581465600, 1581552000, 1581638400,
1583798400, 1583884800, 1583971200, 1584057600, 1584144000, 1584230400,
1584316800, 1584403200, 1587168000, 1587254400, 1587340800, 1587427200,
1587513600, 1587600000, 1587686400), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), `Product Type` = c("LIZX", "LIZX", "LIZX",
"LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX",
"LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX",
"LIZX", "LIZX", "LIZX", "LIZX", "LIZX"), Account = c(931, 931,
931, 931, 931, 931, 931, 931, 931, 931, 931, 931, 931, 931, 931,
931, 931, 931, 931, 931, 931, 931, 931, 931), Subsidiary = c(124,
124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124,
124, 124, 124, 124, 124, 124, 124, 124, 124, 124), Description = c("Transaction",
"Transaction X", "Transaction", "Transaction", "Transaction X",
"Transaction", "Transaction", "Transaction", "Transaction", "Transaction",
"Transaction", "Transaction", "Transaction", "Transaction", "Transaction",
"Transaction", "Transaction", "Transaction", "Transaction", "Transaction",
"Transaction", "Transaction", "Transaction", "Transaction"),
`Policy Number` = c(42057926, 42057926, 42057926, 42057926,
42057926, 42057926, 42057926, 42057926, 42057926, 42060466,
42060466, 42060466, 42060466, 42060466, 42060466, 42060466,
42060466, 42060467, 42060467, 42060467, 42060467, 42060467,
42060467, 42060467), Amount = c(10, -10, 20, -20, 30, 24,
23, 22, -0.56, 1, -1, 2, -2, 2, 3, 4, -1, 3, -3, -3, -3,
-3, -3, -3)), row.names = c(NA, -24L), class = c("tbl_df",
"tbl", "data.frame"))
For some policies (denoted by the policy number), there are reversed transactions, where in the same month, the exact same amount is debited as a positive amount and later removed as a negative amount, or vice versa.
Essentially, I would like to identify and then remove all such reversals from the dataframe. Please note that the pairs of reveral transactions are not always in adjacent rows.
I have tried following the advice given on this related post, but it doesn't seem to address this particular type of dataframe. The biggest issue I am having is that when I try to identify duplicates based on the policy number and amount, all duplicates are tagged, rather than just one pair.
For example, for the policy 42060467, there are 6 transactions of (-3) and one transaction of (3) within the same month. Therefore, I would like one (-3) transaction and one (3) transaction removed, leaving 5 (3) transactions.
Any advice on this would be greatly appreciated!