4

I'm trying to automatize/speed up a control process.

The best way to imagine this is like a little store: I know the total that's supposed to be left in the register at the end of the day (e.g. 150$), I have a list of sales, the sum of which should obviously match our daily total. Yet the list sometimes contains errors e.g. wrong sales.

To give an example, assume that the expected sum is 150$, but our 10 sales sum up to 153.37$. Now if there is a sale in the list that is exactly 3.37$ it is most likely wrong and I want my program to suggest to exclude this.

Now for my real world problem, "luckily" I have more post decimal digits for my sum, the likelihood of suggesting to exclude a wrong sale by accident is very low. On the other hand, I usually deal with up to ~250 sales, of which usually up to 20 can be wrong (so just for constrains think sales(n) =<250 and wrong trades(k) =<20).

In other terms, I have a set S {}. By excluding elements from the set, the Sum of elements in S should match a known Sum A, one can assume that there is an unique solution to this (So no "you could exclude Element B or simply Element D AND E [as B = E+D])

If this is a "known" problem to solve I'd also be happy if you can just give me a hint where to find the fitting literature.

  • 3
    I believe that this is the [subset Sum](https://en.wikipedia.org/wiki/Subset_sum_problem). If you count in pennies instead of dollars, then the target sum and all of the sales are integers which meetrs the requirements for the Subset Sum problem. – RBarryYoung Jun 04 '21 at 16:13
  • 3
    A simple dynamic programming solution is probably your best bet as it can solve the problem in `O(N*D)` (pseudo-polynomial time) where `N` is the number of sales and `D` is the size of the discrepancy (in pennies). – RBarryYoung Jun 04 '21 at 16:27
  • Let me know if you need me to provide example code for the `O(N*D)` dynamic programming solution. – RBarryYoung Jun 05 '21 at 11:31

1 Answers1

3

I believe you're just looking for combinations.

I would first calculate the difference: diff = sum(sales_list) - register_total

Then, filter the sales list for all elements in the set less than or equal to the difference: possibly_wrong_sales = sales list[ sales_list <= diff ]

Then find all possible combinations (and hopefully there is just 1) in the list/array of possibly_wrong_sales such that their sum equals that difference. That question has been answered here before, and answered very well too! The accepted answer includes full algorithms in lots of programming languages: Finding all possible combinations of numbers to reach a given sum

In literature, this is commonly referred to as the Coin Change Problem.

Larry Panozzo
  • 191
  • 2
  • 7
  • 3
    You may be thinking of the [Change-Making problem](https://en.wikipedia.org/wiki/Change-making_problem), but that problem is looking for a minimal number of coins (of pre-defined denomination) to reach a specific sum. This problem appears to be closer to the [Subset sum problem](https://en.wikipedia.org/wiki/Subset_sum_problem). (both are types of Knapsack problems) – RBarryYoung Jun 04 '21 at 16:16
  • As you said, the Change-Making problem looks for the minimal number of coins, though the wrong sales could be a different set of sales. The Subset sum problem finds a boolean and uses integers. – Larry Panozzo Jun 04 '21 at 16:50
  • If OP wants to get the most likely subset, maybe you can consider it's the one with the smallest number of mistakes, in which case you go back to the Change-Making problem. Otherwise you're right it's more the Subset Sum – gdelab Jun 04 '21 at 16:50
  • I have implemented both Subset Sum (SSP) and Change-Making (CMP) solutions in multiple languages and I am certain that this is a common SSP variant problem. CMP allows an unlimited number of coins of any denomination (receipts) whereas this problem only allows one receipt to be applied at most one time. There is an obscure CMP variant called the "0-1 CMP" that could be applied to this, but it's still easier to use the more common SSP variant that identifies the first solution. Also, the accepted answer at that link is general, but extremely slow at `O(n*2^n)`. – RBarryYoung Jun 05 '21 at 11:37
  • 1
    I was indeed referring to a Subset sum problem, as in my case sales can have a positive or negative value assigned to them. – Dontwannausemynormalnick Jun 07 '21 at 21:24