2

I am trying to make an array of N+1 bins for a distribution of N discrete scores.

I assumed numpy's arange could be used for that. However, the function gives me odd values, which have a significant effect on the resulting numpy histograms. Here's a minimal example:

n = 10
a = np.arange(0, 1.01 + 1/n, 1/n)

print(a)
for i in a:
    print(i)

[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.  1.1]

0.0
0.1
0.2
0.30000000000000004
0.4
0.5
0.6000000000000001
0.7000000000000001
0.8
0.9
1.0
1.1

The fact that simply printing the array outputs seemingly normal values is extra misleading. This is a big issue if I want to use this array as a bins argument to numpy's histogram() function because my values are k/10 decimals. In particular, all data points with the value of 0.7 will be placed in the [0.6000000000000001, 0.7000000000000001] bin, whereas I'd expect them to be inside [0.7, 0.8], as per np.histogram() documentation.

The question is whether this is a bug or a feature.

Alex Ten
  • 75
  • 8
  • 2
    [`np.linspace`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html) is probably better suited for this e.g. `np.linspace(0, 1.1, 12)`. You can also use [`np.around`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.around.html) to specify the number of decimals. In any case, if decimal precision is so important to you, you may consider multiplying your data by 10, rounding and working with integers. – jdehesa Apr 26 '18 at 13:38
  • @DavidG May be, but what'd be a good way to approach the binning problem in a general way? – Alex Ten Apr 26 '18 at 13:40
  • @DavidG they have the same answer (as they deal with the same effect), but they are different - the question you have linked is about float point operation `0.1 + 0.2 == 0.3`. – abukaj Apr 26 '18 at 13:49
  • @jdehesa thanks. I guess the main problem is that my data points are not represented in the same way. Although, I am just using np.mean() over a binary vector to get my scores. So I guess np.mean, or np.sum(a)/N round the outputs implicitly? – Alex Ten Apr 26 '18 at 13:51
  • The 'extra question' shall be asked as a separate question, and yes, there is a solution. – abukaj Apr 26 '18 at 13:51
  • @abukaj Thanks for the tip. I guess I've got what I wanted to know. – Alex Ten Apr 26 '18 at 14:07

2 Answers2

1

The problem is that it is hard to represent certain decimal exactly as binary floats - thus they are converted to a close-enough float number.

When you print them without setting an explicit limit for the precision, they may be converted to a decimal representation of a slightly different number than intended.

To set precision to e.g. 5 decimal points, use print('{:.5g}'.format(i)).

abukaj
  • 2,582
  • 1
  • 22
  • 45
1

This is not related with numpy at all, binary representation of these numbers are not exact in computer world.

For example 0.3 can not be represented with finite digits. So that's why you are getting those results, which are close to your expectation but not exactly correct.

You would be surprised when you see this:

>>> print('{:.50}'.format(0.3))
0.2999999999999999888977697537484345957636833190918
BcK
  • 2,548
  • 1
  • 13
  • 27