0

recent I have a request to find noise numbers (up to 20) within a large dataset (up to 10,000), so that the sum of the rest dataset should equal to 0. I understand itertools.combinations can do the job only if I have a super powerful PC.

eg. dataset= [1,2,3,4,5,-5,-4,-3,-2,-1,9,8,7, .....] (len can be upto 10,000). And Noise numbers are [9,8,7], because without noise numbers, the sum of the dataset is 0. By the way, in my real case, the numbers are transaction numbers, so they appear pretty random.

My question is there a feasible way for me to perform such task within a fairly short time (like under a few minutes)

Thank you very much

S.Gu
  • 637
  • 1
  • 6
  • 15
  • What is a noise number - and can you elaborate what calculations exactly you have to perform? (some example(s))? – Gwang-Jin Kim Jul 01 '21 at 08:44
  • Thank you, I provided an example into my question. – S.Gu Jul 01 '21 at 08:50
  • Sounds NP hard, maybe some greedy approach could be faster, but not giving 100% the most correct result – Alexey S. Larionov Jul 01 '21 at 08:51
  • are the numbers floats or integers? – Gwang-Jin Kim Jul 01 '21 at 08:53
  • I'm pretty sure there is some very similar problem from competitive programming, somewhere on codeforces.com or leetcode.com. Maybe [this also](https://www.geeksforgeeks.org/minimum-number-of-elements-to-be-removed-such-that-the-sum-of-the-remaining-elements-is-equal-to-k/) – Alexey S. Larionov Jul 01 '21 at 08:57
  • Still not very clear - how would you decide that `[9, 8, 7]` are noise numbers and not `[3, 3, 3, 2, 2, 2, 2, 1, 1, 2, 3]` ? – Mortz Jul 01 '21 at 09:00
  • they are float, but does it matter anyway? – S.Gu Jul 01 '21 at 09:00
  • How about calculating a sum for the whole `dataset` and then finding **minimum number** (@Mortz) of [elements that would get that number](https://stackoverflow.com/a/3421173/1185254)? In the example: sum == 24, find how we can get 24 with a subset of elements (they would be the Noise Numbers) – alex Jul 01 '21 at 09:00
  • it can be, but in the real world there's only 1 combination is the bunch of noise numbers – S.Gu Jul 01 '21 at 09:02
  • ```[1,2,3,4,-10,8]``` in this list the **noise number is 8**. Am I correct ? – Ram Jul 01 '21 at 09:42

1 Answers1

2

There's two possible interpretations here.

  1. The dataset contains almost entirely numbers x and -x and you want to identify those numbers where the opposite sign does not occur. This is simple:

    positive = {x for x in dataset if x > 0}
    negative = {-x for x in dataset if x < 0}
    unpaired = (positive - negative) | {-x for x in negative - positive}
    
  2. The above special form does not apply and you wish to find the largest subset that sums to 0. This problem is a variant of the well-known subset sum problem, and is NP-hard in general (even just finding any solution, let alone the largest). You can maybe apply one of the pseudopolynomial time algorithms described here.

orlp
  • 112,504
  • 36
  • 218
  • 315
  • Thanks, but my real case contains all transaction numbers, which can appear pretty random – S.Gu Jul 01 '21 at 09:06
  • @S.Gu Then without further special properties my #2 case applies and you are - in the most general case where the numbers can get big - screwed. The best you can hope for is `O(nS)` where `S` is the sum of absolute values of your array. – orlp Jul 01 '21 at 09:07