Rounding floats so that they sum to precisely 1

Question

I have a rather gnarly bit of code that must more-or-less randomly generate a bunch of percentages, stored as decimal floats. That is, it decides that material one makes up 13.307 percent of the total, then stores that in a dict as 0.13307.

The trouble is, I can never get those numbers to add up to exactly one. I'm not entirely certain what the problem is, honestly. It might be something to do with the nature of floats.

Here's the offending code, in all its overcomplicated glory:

while not sum(atmosphere.values())>=1:
    #Choose a material randomly
    themat=random.choice(list(materials.values()))

    #If the randomly chosen material is gaseous at our predicted temperature...
    if themat.vapor < temp:
        #Choose a random percentage that it will make up of our planet's atmosphere, then put it in the atmos dict.
        atmosphere[themat]=round(random.uniform(0.001,0.5),5)

#Find out if the fractions add up to more than 1
difference=(sum(atmosphere.values())-1)
#If one does...
while difference > 0:
    #Choose a random constituent
    themat=random.choice(list(atmosphere.keys()))
    #If that constituent has a higher fraction value than the amount we'd need to reduce the total to 1...
    if atmosphere[themat]>(sum(atmosphere.values())-1):
        #Subtract that much from it.
        atmosphere[themat]-=difference
        #Then break the loop, since we're done and otherwise we'd go on removing bits of the atmosphere forever.
        break
    else:
        #Otherwise, halve its percentage and reduce difference by the amount we reduced the atmosphere 
        oldperc=atmosphere[themat]
        atmosphere[themat]=oldperc/2
        difference-=oldperc/2

#Then, finally, we correct any overcorrections the previous block made.
difference=(sum(atmosphere.values())-1)
if difference < 0:
    #Choose a random mat
    themat=random.choice(list(atmosphere.keys()))
    #Then add enough to it that the total is 1.
    atmosphere[themat]+=difference

Sorry if I've missed something obvious, or am not providing an important bit of information, but I'm tired at the moment, and I've been trying to figure this out for days now.

"Floating point" and "exactly" are not words which should appear in the same sentence together. — interjay, Jul 14 '13 at 16:33
@interjay I disagree: "Every 32 bit binary integer has an exact representation in IEEE754 64 bit binary floating point." is a sentence that is true, useful to know, and includes both "exact" and "floating point". — Patricia Shanahan, Jul 14 '13 at 17:40
@Chris.Stover Anything past three digits is just sitting there looking cool. I chose five as a number off the top of my head. I suppose an obvious if inelegant solution would be to multiply the number by 100000 or something and convert it into an integer, then divide again when the decimal is needed. — Schilcote, Jul 14 '13 at 17:51
With a bit more thought, another, better way to do it would be to simply round the sum whenever it's being checked. 0.99999999 or 1.000001 are close enough, in this specific situation. If no-one sees any problems with this solution, I'll answer the question myself. — Schilcote, Jul 14 '13 at 17:56
Pascal Cuoq has given the best answer so far. But why do you care? What benefit do you derive from making the sum exactly 1? If you are doing some subsequent calculation which relies on the sum being exactly 1, then tell us about that, and there may be better solutions. Given a set of numbers in [0, 1] that are added in a specific order that sum to almost 1, it is not hard to adjust them so the sum is exactly one. Adding/subtracting a small number of half-ULPs of 1 to one of the numbers will suffice (or the change can be distributed among them). But summing to 1 is likely not the real problem. — Eric Postpischil, Jul 14 '13 at 20:03
Note that summing the numbers in a different order will likely produce a different sum, since different rounding errors will occur during the addition. So producing a set of numbers whose mathematical sum is exactly 1 is a different problem from producing a sequence whose floating-point sum is 1. Given these slight errors, you must as well get close, then simply declare the sum to be 1, and ignore the actual floating-point sum—depending on what your actual goal is. — Eric Postpischil, Jul 14 '13 at 20:04

Pascal Cuoq · Answer 1 · 2014-10-28T19:09:02.397

6

If you mean to find two values that add up to 1.0

I understand that you want to pick two floating-point numbers between 0.0 and 1.0 such that they add to 1.0.

Do this:

pick the largest L of the two. It has to be between 0.5 and 1.0.
define the smallest number S as 1.0 - L.

Then in floating-point, S + L is exactly 1.0.

If for some reason you obtain the smallest number S first in your algorithm, compute L = 1.0 - S and then S0 = 1.0 - L. Then L and S0 add up exactly to 1.0. Consider S0 the “rounded” version of S.

If you mean several values X₁, X₂, …, X_N

Here is an alternative solution if you are adding N numbers, each between 0.0 and 1.0, and expect the operations X₁ + X₂ + … and 1.0 - X₁ … to behave like they do in math.

Each time you obtain a new number X_i, do: X_i ← 1.0 - (1.0 - X_i). Only use this new value of X_i from that point onwards. This assignment will slightly round X_i so that it behaves well in all sums whose intermediate results are between 0.0 and 1.0.

EDIT: after doing the above for values X₁, …, X_N-1, compute X_N as 1 - X₁ - … - X_N-1. This floating-point computation will be exact (despite involving floating-point), so that you will have X₁ + … + X_N = 1 exactly.

edited Oct 28 '14 at 19:09

answered Jul 14 '13 at 16:38

Pascal Cuoq

79,187
7
161
281

1

there is no guaruantee that this will work. The problem is that there will be numbers which can not be represented in binary form exactly. This is similar to represent 10/3 in a precise form as integer. It'simpossible by nature, but it can work for some values. – Devolus Jul 14 '13 at 16:40
Another problem is that he states there will be a bunch of percentages. It sounds like this technique might not work with more than 2 percentages. – Chris.Stover Jul 14 '13 at 16:41
3

@Devolus I think there is a guarantee that this will work. It is a theorem. Do you have some counter-example to show with two numbers L and S? – Pascal Cuoq Jul 14 '13 at 16:43
@Chris.Stover You are right, I was not sure that I was understanding the question, this is why I rephrased it at the beginning of my answer. I would have to think about the problem if the question is for more than two values. – Pascal Cuoq Jul 14 '13 at 16:44
@Chris.Stover I think I have a solution for N variables, see edit. – Pascal Cuoq Jul 14 '13 at 17:01
That allmost works. Take the doubles 0.333333 0.333333 0.3333335 and 0.0000005, then apply 1-(1-xi) to each. Then all the possible sums are the same, but they slightly differ from 1 (all the sum are the predecessor of 1). – aka.nice Oct 28 '14 at 18:37
@aka.nice Ah yes, the N variables solution makes addition behave as fixed-point between all the variables but does not enforce the “sum equal 1” constraint. I guess I forgot to specify to use `1 - X1 - … -Xn-1` as the value of `Xn`. – Pascal Cuoq Oct 28 '14 at 19:06
@aka.nice And there remain some corner cases, if Xn must absolutely not be negative but all the previous Xi have been rounded up, or if N comes close to or goes beyond 2^significandwidth. – Pascal Cuoq Oct 28 '14 at 20:18

will · Accepted Answer · 2013-07-15T13:45:38.740

From your code it looks like you're randomly generating planet atmospheres, presumably for some kind of game or something. At any rate, the randomness of it is convincing me it doesn't need to be too accurate.

So i'd suggest you don't use floats, just use ints and go up to 100. Then you'll get your exact summing. For any maths you want to use them in just cast.

Is this not an option?

If you insist on using floats, then read on...

The problem you have using floats is as follows:

A floating point (in this case a double) is represented like this:

enter image description here

which corresponds to a double of value:

enter image description here

So,

your number is (1+M) * 2**(E) (where E = e-offset)

1+M is always in the range 1-2.

So, we have equally spaced numbers inbetween each pair of power of two (positive and negative), and the spacing between the numbers doubles with each increase in the exponent, E.

Think about this, it means that there is a constant spacing of representable numbers between each of these numbers [(1,2),(2,4),(4,8), etc]. This also applies to the negative powers of two, so:

0.5 - 1
0.25 - 0.5
0.125 - 0.25
0.0625 - 0.125
etc.

And in each range, there are the same quantity of numbers. This means that if you take a number in the range (0.25,0.5) and add it to a number in the range (0.5,1), then you have a 50% chance that the number cannot be exactly represented.

If you sum two floating point numbers for which the exponents differ by D, then the chances of the sum being exactly representable are 2^-D.

If you then want to represent the range 0-1, then you'll have to be very careful about which floats you use (i.e. force the last N bits of the fraction to be zero, where N is a function of E).

If you go down this route, then you'll end up with far more floats at the top of the range than the bottom.

The alternative is to decide how close to zero you want to be able to get. Lets say you want to get down to 0.0001.

0.0001 = (1+M) * 2^E

log₂(0.0001) = -13.28771...

So we'll use -14 as our minimum exponent.

And then to get up to 1, we just leave the exponent as -1.

So now we have 13 ranges, each with twice as many values as the lower one which we can sum without having to worry about precision.

This also means though, that the top range has 213 more values we can use. This obviously isn't okay.

So, after picking a float, then round it to the nearest allowable value - in this case, by round I just mean set the last 13 bits to zero, and just put it all in a function, and apply it to your numbers immediately after you get them out of rand.

Something like this:

from ctypes import *

def roundf(x,bitsToRound):

    i = cast(pointer(c_float(x)), POINTER(c_int32)).contents.value

    bits = bin(i)

    bits = bits[:-bitsToRound] + "0"*bitsToRound

    i = int(bits,2)

    y = cast(pointer(c_int32(i)), POINTER(c_float)).contents.value

    return y

(images from wikipedia)

You should pick a power of 2 instead of 100 so that you get floats that add to 100. — tmyklebu, Jul 15 '13 at 02:39
This is essentially what the code in my answer does, except with 9007199254740992 instead of 100. — Pascal Cuoq, Jul 15 '13 at 08:56
@tmyklebu Why would picking a power of two allow the floats to sum to 1? — will, Jul 15 '13 at 12:51
@PascalCuoq I'm not entirely sure your answer will work though. There's no guarantee that the sum of two floating point numbers can be represented, and there is no guarantee that `(1-sum(floats))` can be exactly represented either. — will, Jul 15 '13 at 12:54
“There's no guarantee that the sum of two floating point numbers can be represented” That's what Devolus said, too. But my proposal, if you read it, is not to just use floating-point numbers and assume floating-point arithmetic on them is exact. — Pascal Cuoq, Jul 15 '13 at 14:35
@PascalCuoq Ah, so your `1-(1-X_i)` is actually doing the same thing as my `roundf`, just in a more elegant way. I see now. — will, Jul 15 '13 at 14:58
In a way, your approach is to this blog post (to round to the nearest integer—same problem at a different precision): http://blog.frama-c.com/index.php?post/2013/05/03/nearbyintf2 what my approach is to this blog post: http://blog.frama-c.com/index.php?post/2013/05/04/nearbyintf3 . They both can have their uses. Using floating-point arithmetic sets floating-point flags (such as Inexact) whereas fiddling with the representation's bits does not. — Pascal Cuoq, Jul 15 '13 at 15:04
@PascalCuoq "binade". There should be a word to describe this, i agree! — will, Jul 15 '13 at 15:33
@will: I think I misinterpreted your answer. I was reading it as: "Just pick random ints that sum to 100, then divide the suckers by 100. Nothing could possibly go wrong!" This approach that I mistakenly projected onto your answer obviously runs into representability issues that are alleviated if you take a power of two instead of 100. — tmyklebu, Jul 15 '13 at 20:52

score 1 · Answer 3 · answered Jul 20 '13 at 16:19

1

In the end, it turned out the simplest solution was to change the problem. Rounding the sum to 5 digits of precision with round(x,5) whenever it was checked gave adequate results.

answered Jul 20 '13 at 16:19

Schilcote

2,344
1
17
35

score 0 · Answer 4 · answered Jul 14 '13 at 16:36

0

Since floats are stored in the machine in a binary representation, there are always numbers that can not be precisely represented. If you need to work around this limitation, you must use some math library, that uses custom defined datatypes.

answered Jul 14 '13 at 16:36

Devolus

21,661
13
66
113

1

This answer does not address the question. After a set of numbers have been selected by the methods described, we have a set of representable numbers. We are not concerned with being able to represent esoteric values, just with adjusting values so the sum is one. This is a feasible goal. – Eric Postpischil Jul 14 '13 at 20:06

Chris.Stover · Answer 5 · 2013-07-14T16:50:39.177

-2

floats are are represented by powers of two. From the python docs: "Unfortunately, most decimal fractions cannot be represented exactly as binary fractions"

http://docs.python.org/2/tutorial/floatingpoint.html

EDIT: Maybe instead of actually trying to get to 1.0000000000000000000000 you should determine an acceptable level of error by cutting off anything after the third decimal place. You can be relatively certain that the value added to 1. Using this concept you could accept any answer greater than 0.999 and less than 1.001.

This may not be perfect but it might be a good workaround to get you past your problem.

edited Jul 14 '13 at 16:50

answered Jul 14 '13 at 16:39

Chris.Stover

516
6
14

This is true, but not the real cause of the problem. If base-2 fixed-point was used instead of floating-point, most decimal fractions could still not be represented exactly, but it would still be possible to sum to exactly 1 as the OP wants. – interjay Jul 14 '13 at 16:42
I disagree it would be possible for every value to be represented as a value that is greater than its true fractional value. If you had a bunch of values that were 0.3% then there would be rounding up on every value. – Chris.Stover Jul 14 '13 at 16:48
There would be rounding up, but after all the rounding, you could make small changes in some of the numbers to make them add to 1 (when using fixed-point). When using floating-point it's not that easy. – interjay Jul 14 '13 at 17:03

Rounding floats so that they sum to precisely 1

5 Answers5

If you mean to find two values that add up to 1.0

If you mean several values X₁, X₂, …, X_N

Linked

Related

Rounding floats so that they sum to precisely 1

5 Answers5

If you mean to find two values that add up to 1.0

If you mean several values X1, X2, …, XN

Linked

Related

If you mean several values X₁, X₂, …, X_N