Edit 2020-11-21: (Added section starting here)
Reason for discrepancy:
A peek at the source code for numpy.linspace
reveals the following:
- First, the
step
is calculated from given start
, stop
, and num
. There's only one way to do this, and there's nothing wrong or imprecise in how step
is being computed.
- Successive output values are produced by taking the base value of
start
, and adding increasingly higher multiples of step
, to the start
value. So, for example, the first value is produced by adding (0 * step)
to start
, and the next value is produced by adding (1 * step)
to start
, and so on. Again, nothing wrong or unusual or imprecise in this. Remember that the primary promises of linspace()
are to ensure uniformity of the spacing, and to ensure that exactly the requested number of samples are output.
- If
endpoint=True
in the call to linspace()
, the last output value is forced to be equal to the specified stop
value. Again, for most use-cases, this would seem the more desirable thing to do, rather than outputting a computed value as the last value, no matter how cleverly the last value is computed.
So, where is the imprecision coming from?
I can think of three reasons for the discrepancy being discussed here:
- Wrong expectation for the given input values of
start
, stop
and num
: Even in an ideal world where there are no inherent limitations in FP representation, it is impossible for x[-3]
to equal 2148.95
, when start=2110.35
, stop=2149.05
, and num=775
. It may be possible only if these input values were slightly different, for example num=776
instead of num=775
-- in which case (stop - start)/(776-1)
would be a nice round spacing of 0.05
, which means it would then be reasonable to expect x[-3]
to equal 2148.95
.
- The
start
and start
values themselves happen to be incapable of being precisely represented in FP. (Remember that most fractions can be represented in FP only with some imprecision. The notable exceptions are fractions that are negative powers of 2
). So, with start
and stop
, what we see is not what the computer sees. This obviously affects the computation of step
. The imprecision in this computed step
is num
times less than the imprecision in (stop - start)
. However, note that towards the end of the output sequence, this imprecision in (stop - start)
will have an increasing impact, as increasingly higher multiples of step
get added to start
. Here's a demonstration of this:
print (format(2110.35, ".40g"))
print (format(2149.05, ".40g"))
print (format(0.125, ".40g")) # 0.125 is an FP-friendly vaue (it is 2**(-3))
outputs:
2110.349999999999909050529822707176208496
2149.050000000000181898940354585647583008
0.125
- (As OP is already aware) Each computed output value (however precisely the computation is done), may or may not be representable precisely in FP. In practice, this inherent imprecision of FP will affect most of the generated output values.
Hypothetically speaking, it could be argued that perhaps numpy
developers could have inserted extra interventional logic -- they could have tried to somehow figure out which of the output values can be precisely represented in FP, and for those output positions, they could output forced true values rather than computed values. Such an intervention is currently being done for the last output value, when endpoint=True
. However, such excessive intervention would violate the primary promise of regularity of spacing, in addition to affecting the speed.
Another hypothetical debate could be on whether the successive values could be generated by cumulative addition, rather than by (start + i*step)
. That is, the second value could be computed as (first + step)
, the third could be computed as (second + step)
, and so on. I am not sure why such an implementation was not chosen, but I doubt that will get us any benefits in precision. The factors that I suspect to be causing the imprecision now, would still apply (the inherent limitation of FP representation for precisely representing the computed values, and the imprecision in start
, stop
and step
). It will also probably be slower, since each value can be computed only after the previous one has been computed.
To summarize, rather than the linspace
algorithm being sub-optimal, the real culpable reasons are -- wrong expectation (or wrong input), and inherent limitation of FP representation affects start
, stop
, and the computed values.
Edit 2020-11-21: (Added section ends here)
Your little test:
print(y, type(y))
producing the output:
2148.95 <class 'numpy.float64'>
does not necessarily prove that y
contains exactly 2148.95
.
That's because, when you call print(y)
, it internally invokes str(y)
, and we really don't know too much about the precision to which str(y)
produces its output.
Suggested Actions:
- In the invocation of
linspace()
, change num=775
to num=776
, to match your expectation of step=0.05
- To perform test such as
if x[-3] <= y:
, do something like if round(x[-3],5) <= round(y,5):
instead (here, I'm assuming that you care about no more than 5 digits after the decimal point).