2

We use boost::multiprecision::cpp_rational to represent fractions.

At some point, I have to determine if a sum of a potentially huge number of fractions reaches 1 or not. I don’t care about the exact value of the sum; not even if it’s exactly 1 or above. The individual fractions are all between 0 and 1 (both exclusive).

Is there some clever math trick that I’m not aware of that spares me from actually having to calculate the sum?


There are some domain-specific considerations:

  • Getting a lot of fractions with the same denominators is a common case and not a rare happenstance.
  • The case where the sum is exactly 1 is a common case.
  • The sum exceeding 1 by a large amount (exceeding 2) is probably so rare it essentially never happens.

My optimization attempts so far:

As a first step, I already calculate partial sums in a std::map that “sorts” the fractions by denominator and adds same-denominator fractions first.

The partial sums are then sorted by size, largest first.

But after that, I just add them and after each addition, check if the result is already greater or equal than 1 and quit early if they are (that’s because the fractions are all positive).

Quirin F. Schroll
  • 1,302
  • 1
  • 11
  • 25
  • 2
    I assume that you've tried the obvious "add up floating point approximations, and use that answer if your sum is, after roundoff, either clearly below or above 1." Also try sorting by denominator and see if that reduces the number of terms usefully. But if the answer is exactly 1, you'll have to do the exact calculation. – btilly Aug 10 '23 at 18:31
  • 2
    Well, simply, no. If the sum is <=1, then you need to read all numbers (unless you have precomputed some quantity like partial sums before) since the last unread number can be the one changing the final outcome. If the sum is greater than 1, then you do not need to read all the numbers (sorting helps), but since you said this is very rare, then it is not useful to optimize the algorithm for this specific use-case. You can use approximation/heuristics to discard the one that quickly get much bigger than one or much smaller than one, but for exact solutions =1, this requires the full computation – Jérôme Richard Aug 10 '23 at 18:31
  • Are you guaranteed that the numerator and denominator of each fraction is an integer. If so, there might be some interesting tricks. But I'm with the others - why wouldn't you want to just calculate the sum of each fraction? – selbie Aug 10 '23 at 18:56
  • *At some point, I have to determine if a sum of a potentially huge number of fractions reaches 1 or not.* -- Where did these "huge number of fractions" come from? Why not calculate the total straight from how/when/where you retrieved these fractions? Or why not keep a running total whenever a fraction is added to the list (or subtracted from the list) of fractions? – PaulMcKenzie Aug 10 '23 at 19:06
  • 1
    Calculate the difference. 1 minus every number in the list – Thomas Weller Aug 10 '23 at 19:10
  • 3
    "spares me from actually having to calculate the sum?" - this sounds like a performance problem. Unless you have clear performance requirements and as long as you haven't measured this, it's an invalid performance problem. – Thomas Weller Aug 10 '23 at 19:12
  • I haven't used the boost library, so I have to ask a couple obvious questions. 1) The fractions are stored as a numerator denominator pair, right? 2) Do you have an easy way to check how many digits are in the denominator? – user3386109 Aug 10 '23 at 19:13
  • @ThomasWeller I had that idea, profiled it and it didn’t show a difference. – Quirin F. Schroll Aug 10 '23 at 19:48
  • @PaulMcKenzie “Where did these "huge number of fractions" come from?” Customer input. The run-off-the-mill user input is not huge, but again, the small ones are not the problem. The whole input is in a government-standardized format. – Quirin F. Schroll Aug 10 '23 at 19:51
  • 3
    You didn't understand. Again: what are your performance requirements? You haven't mentioned any numbers and not any profiling data in the question. – Thomas Weller Aug 10 '23 at 19:54
  • @JérômeRichard Most definitely not “simply”. I’ve seen really clever tricks in my computer science lectures. As a rough idea of what I mean, there could be a criterion like: “Add all the numerators and all the denominators (separately); if the resulting fraction is greater than 1, so would be the sum, however, if it’s smaller than 1, it means nothing.” I’m quite sure that this exposition is not a real criterion, but I hope you got the idea. – Quirin F. Schroll Aug 10 '23 at 19:59
  • @ThomasWeller, I don’t have direct performance requirements. It’s not a real-time application. My goal isn’t micro-optimizations that save 2%. My question was if I’ve been missing a clever trick, something that could give me run-time reductions on the order of 90%. Think hash map versus linear search in a vector of key-value pairs. – Quirin F. Schroll Aug 10 '23 at 20:02
  • So the crux of the question is: How best to avoid `N` floating-point divisions? – Mooing Duck Aug 10 '23 at 20:07
  • 3
    Is there a known range of the denominators? Are they all between say 1 and 1000? Are they all powers of 10? Etc? – Mooing Duck Aug 10 '23 at 20:09
  • without making assumptions about the denominator, I don't think you can beat the [map summing of same denominators first](http://coliru.stacked-crooked.com/a/70126836bb77f7ed). Its unclear if "sorting" and checking after each division would be faster (fewer divisions) or slower (more branching+cache). – Mooing Duck Aug 10 '23 at 20:21
  • 1
    @QuirinF.Schroll, would you explain what your actual application is? – lastchance Aug 10 '23 at 20:55
  • It really matters what the denominators are. If there aren't that many unique denominators you could add all the numerators for fractions with the same denominator. And if all the denominators are multiples of each other (like powers to 2 or 10) then you could just add the numerator sum multiple times. – Jerry Jeremiah Aug 10 '23 at 21:22
  • If you care about performance, then you should store the values in a contiguous array-like data structure like a `std::vector` (clearly not `std::map`) and try to use SIMD optimisations (AVX-2 registers can hold 8x 32-bit integers). If there are a lot of fractions, you can do a parallel reduction. Fractions can be reduced using the GCD and the LCM so not to produce big numbers (unless you have a lot of prime numbers). As for the trick, AFAIK, the only one is that if the sum of the weighted numerator is equal to the product of the denominator then the sum of the fraction is 1. – Jérôme Richard Aug 10 '23 at 21:34
  • Note that if you want a pure-math trick, then it is better to ask it on https://mathoverflow.net . – Jérôme Richard Aug 10 '23 at 21:43
  • Modular arithmetic can certainly be used to avoid operating on big numbers but this method will certainly be a bit inefficient since modulus are slow. This method can only be used to check if the sum is exactly 1 or not (the ordering is lost due to rings). – Jérôme Richard Aug 10 '23 at 21:44
  • 1
    Is there an upper bound on the number of unique denominators? What about the range of denominators? In theory, this problem should not require _any_ floating point divisions, since during the addition stage you only need to check if the numerator becomes as large as the denominator. You may even choose to memoize GCD for common pairs of denominators, which might speed up future calculations if this process happens repeatedly. – paddy Aug 10 '23 at 23:04

1 Answers1

1

My idea is to test to see if the sum of the fractions S exceeds 1 by calculating upper and lower bounds, S_U and S_L respectively. You choose a large denominator D, then for each fraction n_i/d_i you find s_i such that

s_i/D <= n_i/d_i <= (s_i + 1)/D

and calculate

S_L = sum{i=1 to n} s_i/D

and

S_U = sum{i=1 to n} (s_i + 1)/D

If both S_L and S_U are less than 1, then S < 1. If both S_L and S_U are greater than 1, then S > 1. Otherwise, the test is inconclusive and you have to do it the long way, adding up all the fractions, try again with a larger D, or use a 'Probabilistic Sum To Unity Test' (see below) to 'prove' that it sums to exactly 1.

With this test, you have to go through all the fractions, but you don't have massive intermediate results, so hopefully it will be faster, assuming it is conclusive. Floating point arithmetic should be assumed to be approximate so should either not be used or used paying great attention to the rounding criteria.

You could construct a 'Probabilistic Sum To Unity Test' using modular arithmetic. Take a prime number P at random such that P does not divide any of the d_i. Then you can calculate S mod P, summing the fractions mod P. If the choice of P is random enough, then you can assume that the probability that S != 1 and S = 1 mod P is 1/P. Moreover, if P_1 and P_2 are such primes, then the probability that S != 1 but S = 1 mod P_1 and S = 1 mod P_2 is 1/(P_1P_2). This way, it should be easy to obtain a vanishingly small probability that S != 1 when S = 1 mod P_j for all j primes.

Simon Goater
  • 759
  • 1
  • 1
  • 7