2

I have a dataframe of over 400,000 rows, ranging from January to December 2020, with the following being a snippet over three months:

 tab <- structure(list(Date = structure(c(1580947200, 1581033600, 1581120000, 
1581206400, 1581292800, 1581379200, 1581465600, 1581552000, 1581638400, 
1583798400, 1583884800, 1583971200, 1584057600, 1584144000, 1584230400, 
1584316800, 1584403200, 1587168000, 1587254400, 1587340800, 1587427200, 
1587513600, 1587600000, 1587686400), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), `Product Type` = c("LIZX", "LIZX", "LIZX", 
"LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", 
"LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", 
"LIZX", "LIZX", "LIZX", "LIZX", "LIZX"), Account = c(931, 931, 
931, 931, 931, 931, 931, 931, 931, 931, 931, 931, 931, 931, 931, 
931, 931, 931, 931, 931, 931, 931, 931, 931), Subsidiary = c(124, 
124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 
124, 124, 124, 124, 124, 124, 124, 124, 124, 124), Description = c("Transaction", 
"Transaction X", "Transaction", "Transaction", "Transaction X", 
"Transaction", "Transaction", "Transaction", "Transaction", "Transaction", 
"Transaction", "Transaction", "Transaction", "Transaction", "Transaction", 
"Transaction", "Transaction", "Transaction", "Transaction", "Transaction", 
"Transaction", "Transaction", "Transaction", "Transaction"), 
    `Policy Number` = c(42057926, 42057926, 42057926, 42057926, 
    42057926, 42057926, 42057926, 42057926, 42057926, 42060466, 
    42060466, 42060466, 42060466, 42060466, 42060466, 42060466, 
    42060466, 42060467, 42060467, 42060467, 42060467, 42060467, 
    42060467, 42060467), Amount = c(10, -10, 20, -20, 30, 24, 
    23, 22, -0.56, 1, -1, 2, -2, 2, 3, 4, -1, 3, -3, -3, -3, 
    -3, -3, -3)), row.names = c(NA, -24L), class = c("tbl_df", 
"tbl", "data.frame"))

For some policies (denoted by the policy number), there are reversed transactions, where in the same month, the exact same amount is debited as a positive amount and later removed as a negative amount, or vice versa.

Essentially, I would like to identify and then remove all such reversals from the dataframe. Please note that the pairs of reveral transactions are not always in adjacent rows.

I have tried following the advice given on this related post, but it doesn't seem to address this particular type of dataframe. The biggest issue I am having is that when I try to identify duplicates based on the policy number and amount, all duplicates are tagged, rather than just one pair.

For example, for the policy 42060467, there are 6 transactions of (-3) and one transaction of (3) within the same month. Therefore, I would like one (-3) transaction and one (3) transaction removed, leaving 5 (3) transactions.

Any advice on this would be greatly appreciated!

LoriDori
  • 147
  • 2
  • 8
  • 1
    Please clarify whether these pair of transactions are always as adjacent rows? Please also include some data as [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – AnilGoyal Feb 16 '21 at 14:30
  • Thanks @AnilGoyal they are not always in adjacent rows and I have added a reproducible data. – LoriDori Feb 16 '21 at 14:42

0 Answers0