2

I'm trying to return a vector (1-d numpy array) that has a sum of 1. The key is that it has to equal 1.0 as it represents a percentage. However, there seems to be a lot of cases where the sum does not equal to 1 even when I divided each element by the total. In other words, the sum of 'x' does not equal to 1.0 even when x = x'/sum(x')

One of the cases where this occurred was the vector below.

x = np.array([0.090179377557090171, 7.4787182000074775e-05, 0.52465058646452456, 1.3594135000013591e-05, 0.38508165466138505])

The summation of this vector x.sum() is 1.0000000000000002 whereas the summation of the vector that is divided by this value is 0.99999999999999978. From that point on that reciprocates.

What I did do was round the elements in the vector by the 10th decimal place (np.round(x, decimals = 10)) then divided this by the sum which results in a sum of exactly 1.0. This works when I know the size of the numerical error. Unfortunately, that would not be the case in usual circumstances.

I'm wondering if there is a way to correct the numerical error of only when the vector is known so that the sum will equal to 1.0.

Edit: Is floating point math broken? This question doesn't answer my question as it states only 'why' the difference occurs and not how to resolve the issue.

Cody Chung
  • 629
  • 1
  • 6
  • 15
  • Possible duplicate of [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) – ForceBru Aug 02 '19 at 10:04

1 Answers1

3

A bit of a hacky solution:

x[-1] = 0
x[-1] = 1 - x.sum()

Essentially shoves the numerical errors into the last element of the array. (No roundings beforehand are needed.)

Note: A mathematically simpler solution:

x[-1] = 1.0 - x[:-1].sum()

does not work, due to different behavior of numpy.sum on whole array vs a slice.

Misza
  • 635
  • 4
  • 15
  • 1
    Doesn't work for (sorry, it's a bit longish) `[0.00329, 0.00662, 0.00015, 0.00599, 0.00716, 0.00977, 0.012, 0.00821, 0.01535, 0.01373, 0.00785, 0.00495, 0.01015, 0.01269, 0.0048, 0.0098, 0.0014, 0.01145, 0.00672, 0.00839, 0.01039, 0.00806, 0.00343, 0.01039, 0.01003, 0.01293, 0.00399, 0.00411, 0.01115, 0.01521, 0.01416, 0.01484, 0.00695, 0.00939, 0.00704, 0.00153, 0.00177, 0.015, 0.00842, 0.00152, 0.00507, 0.01366, 0.00039, 0.01478, 0.01384, 0.00819, 0.00235, 0.01299, 0.00329, 0.00807, 0.01205, 0.01507, 0.00909, 0.00721, 0.009, 0.00597, 0.00289, 0.00677, 0.01353, 0.01214, 0.00027, 0.00641,` – Paul Panzer Aug 02 '19 at 11:42
  • `0.0072, 0.00619, 0.00235, 0.00866, 0.01043, 0.00069,0.00339, 0.00227, 0.0122, 0.00307, 0.0121, 0.01124, 0.00459, 0.00753, 0.00522, 0.01116, 0.00778, 0.00577, 0.01287, 0.01541, 0.00744, 0.00384, 0.00364, 0.01181, 0.01569, 0.00472, 0.00445, 0.00361, 0.00195, 0.0074, 0.01486, 0.0042, 0.00988, 0.00392, 0.00306, 0.01487, 0.00316, 0.00969, 0.00361, 0.00653, 0.00205, 0.00924, 0.00248, 0.00538, 0.01237, 0.01002, 0.0052, 0.01371, 0.00215, 0.00288, 0.0061, 0.01096, 0.01083, 0.01333, 0.00946, 0.00049, 0.0159, 0.00551, 0.01367, 0.00834, 0.00069, 0.0053, 0.00472, 0.01496, 0.00063, 0.011370000000000324]` – Paul Panzer Aug 02 '19 at 11:42
  • @PaulPanzer Your example data works fine for me: [code](https://gist.github.com/Misza13/218ef146684073684f4dc7ccbaf033c8) – Misza Aug 02 '19 at 11:52
  • You are not using numpy sum. – Paul Panzer Aug 02 '19 at 11:54
  • 1
    @PaulPanzer Thanks, I understand now. Updated answer and left original as warning (for those would would want to refactor). Leaves me wondering why `np.sum` behaves differently on slices. – Misza Aug 02 '19 at 12:49
  • It's not so much slice vs. non-slice, but multiple of 8 terms vs. non multiple of 8 terms. These lead to slightly different orders of summation. – Paul Panzer Aug 02 '19 at 13:22