44

I know rounding errors happen in floating point arithmetic but can somebody explain the reason for this one:

>>> 8.0 / 0.4  # as expected
20.0
>>> floor(8.0 / 0.4)  # int works too
20
>>> 8.0 // 0.4  # expecting 20.0
19.0

This happens on both Python 2 and 3 on x64.

As far as I see it this is either a bug or a very dumb specification of // since I don't see any reason why the last expression should evaluate to 19.0.

Why isn't a // b simply defined as floor(a / b) ?

EDIT: 8.0 % 0.4 also evaluates to 0.3999999999999996. At least this is consequent since then 8.0 // 0.4 * 0.4 + 8.0 % 0.4 evaluates to 8.0

EDIT: This is not a duplicate of Is floating point math broken? since I am asking why this specific operation is subject to (maybe avoidable) rounding errors, and why a // b isn't defined as / equal to floor(a / b)

REMARK: I guess that the deeper reason why this doesn't work is that floor division is discontinuous and thus has an infinite condition number making it an ill-posed problem. Floor division and floating-point numbers simply are fundamentally incompatible and you should never use // on floats. Just use integers or fractions instead.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
0x539
  • 1,489
  • 1
  • 13
  • 30
  • 4
    Interestingly, `'%.20f'%0.4` gives `'0.40000000000000002220'`, so `0.4` is apparently just a little bit over `0.4`. – khelwood Jul 26 '16 at 11:47
  • 2
    @khelwood how does `floor(8.0/0.4)` produce correct results? – Aswin Murugesh Jul 26 '16 at 11:49
  • 1
    @AswinMurugesh I don't know; that's why I haven't posted an answer to this question. – khelwood Jul 26 '16 at 11:50
  • 2
    First, floating-point numbers with the `float` type are usually wrong. Second, `//` and `%` are pretty unreliable (meaning, unexpected behavior) with negative numbers and `float` numbers. The [documentation on Decimal objects](https://docs.python.org/3/library/decimal.html#decimal-objects) briefly discusses `//` with negative integers and how the `Decimal` library handles it differently. – TigerhawkT3 Jul 26 '16 at 12:01
  • 1
    @TigerhawkT3 Yeah, well `%` and `//` are typically inteded as *integer* operations. In any case note that `x = 8.0;y=0.4;q = x//y; r = x%y` still satisfy the law: `q*y+r == x` which is the property you want from `//` and `%`. Unfortunately `math` has no way to compute the remainder so if you choose `q = math.floor(x/y)` you don't have a built-in way to find an `r` that satisfy that property. – Bakuriu Jul 26 '16 at 13:44
  • Don't use floating point for exact computation. –  Jul 26 '16 at 14:18
  • The real surprise, to me, is that `8 / 0.4 != 8.0 // 0.4` in Python 2.7. I had thought that part of the reason for the `//` operator was to preserve the behavior of Python 2's `/` operator for compatibility in Python 3 as a separate operator implemented in both 2 and 3. – Kyle Strand Jul 26 '16 at 16:09
  • 3
    Possible duplicate of [Is floating point math broken?](http://stackoverflow.com/questions/588004/is-floating-point-math-broken) – Alexander Vogt Jul 26 '16 at 20:45
  • 4
    @AlexanderVogt Not really, is it? The question is not why floating point results are not exact but more about why python does two different things for `floor(8.0/0.4)` and the"floor-division" `8.0//0.4`. – jotasi Jul 27 '16 at 03:45
  • 1
    [PEP 238](https://www.python.org/dev/peps/pep-0238/#semantics-of-floor-division) states that `a // b == floor(a/b)`. I think the PEP is buggy, or at least ambiguous. Also, `8.0 / 0.4 < 20.0 == False`. I can't figure out why this is considered desirable output for `//` with two float arguments. – japreiss Jul 09 '20 at 00:30

5 Answers5

33

As you and khelwood already noticed, 0.4 cannot be exactly represented as a float. Why? It is two fifth (4/10 == 2/5) which does not have a finite binary fraction representation.

Try this:

from fractions import Fraction
Fraction('8.0') // Fraction('0.4')
    # or equivalently
    #     Fraction(8, 1) // Fraction(2, 5)
    # or
    #     Fraction('8/1') // Fraction('2/5')
# 20

However

Fraction('8') // Fraction(0.4)
# 19

Here, 0.4 is interpreted as a float literal (and thus a floating point binary number) which requires (binary) rounding, and only then converted to the rational number Fraction(3602879701896397, 9007199254740992), which is almost but not exactly 4 / 10. Then the floored division is executed, and because

19 * Fraction(3602879701896397, 9007199254740992) < 8.0

and

20 * Fraction(3602879701896397, 9007199254740992) > 8.0

the result is 19, not 20.

The same probably happens for

8.0 // 0.4

I.e., it seems floored division is determined atomically (but on the only approximate float values of the interpreted float literals).

So why does

floor(8.0 / 0.4)

give the "right" result? Because there, two rounding errors cancel each other out. First 1) the division is performed, yielding something slightly smaller than 20.0, but not representable as float. It gets rounded to the closest float, which happens to be 20.0. Only then, the floor operation is performed, but now acting on exactly 20.0, thus not changing the number any more.


1) As Kyle Strand points out, that the exact result is determined then rounded isn't what actually happens low2)-level (CPython's C code or even CPU instructions). However, it can be a useful model for determining the expected 3) result.

2) On the lowest 4) level, however, this might not be too far off. Some chipsets determine float results by first computing a more precise (but still not exact, simply has some more binary digits) internal floating point result and then rounding to IEEE double precision.

3) "expected" by the Python specification, not necessarily by our intuition.

4) Well, lowest level above logic gates. We don't have to consider the quantum mechanics that make semiconductors possible to understand this.

Community
  • 1
  • 1
das-g
  • 9,718
  • 4
  • 38
  • 80
  • 2
    "it seems floored division is determined atomically" -- excellent guess, and I suppose correct semantically, but in terms of what the implementation must do, it's sort of backwards: since there's no hardware support that supports the "atomic" `//` semantics, the remainder is pre-calculated and subtracted from the numerator to ensure that the floating point division (when it finally occurs) simply *computes* the correct value immediately, without needing further adjustment. – Kyle Strand Jul 26 '16 at 16:25
  • 1
    Yeah, I'm using term "atomic" from the user's (i.e., Python programmer's) view, here. Similar to how e.g. certain database operations may be described as "atomic" that also do not map to a single hardware instruction. So I'm talking about effect, not implementation. – das-g Jul 26 '16 at 19:24
  • Apropos implementation, whether or not hardware supports a native instruction equivalent to Python's `//` operator would of course depend on the hardware and on the operand types. Early CPUs certainly had integer division support for integer operands. There might not be any chipset with native support for floored division of floats, but it wouldn't be inconceivable either, as it'd be merely impractical, not impossible. – das-g Jul 26 '16 at 19:32
  • 1
    "The same probably happens for `8.0//0.4`". Not really, at least for cpython. They actually rather do `round((8.0 - fmod(8.0, 0.4)) / 0.4)` which gives `19` because (at least for my machine/compiler version) `fmod(8.0/0.4)` results in `0.4` (also in pure C). See my answer for details. – jotasi Jul 27 '16 at 03:52
15

After checking the semi-official sources of the float object in cpython on github (https://github.com/python/cpython/blob/966b24071af1b320a1c7646d33474eeae057c20f/Objects/floatobject.c) one can understand what happens here.

For normal division float_div is called (line 560) which internally converts the python floats to c-doubles, does the division and then converts the resulting double back to a python float. If you simply do that with 8.0/0.4 in c you get:

#include "stdio.h"
#include "math.h"

int main(){
    double vx = 8.0;
    double wx = 0.4;
    printf("%lf\n", floor(vx/wx));
    printf("%d\n", (int)(floor(vx/wx)));
}

// gives:
// 20.000000
// 20

For the floor division, something else happens. Internally, float_floor_div (line 654) gets called, which then calls float_divmod, a function that is supposed to return a tuple of python floats containing the floored division, as well as the mod/remainder, even though the latter is just thrown away by PyTuple_GET_ITEM(t, 0). These values are computed the following way (After conversion to c-doubles):

  1. The remainder is computed by using double mod = fmod(numerator, denominator).
  2. The numerator is reduced by mod to get a integral value when you then do the division.
  3. The result for the floored division is calculated by effectively computing floor((numerator - mod) / denominator)
  4. Afterwards, the check already mentioned in @Kasramvd's answer is done. But this only snaps the result of (numerator - mod) / denominator to the nearest integral value.

The reason why this gives a different result is, that fmod(8.0, 0.4) due to floating-point arithmetic gives 0.4 instead of 0.0. Therefore, the result that is computed is actually floor((8.0 - 0.4) / 0.4) = 19 and snapping (8.0 - 0.4) / 0.4) = 19 to the nearest integral value does not fix the error made introduced by the "wrong" result of fmod. You can easily chack that in c as well:

#include "stdio.h"
#include "math.h"

int main(){
    double vx = 8.0;
    double wx = 0.4;
    double mod = fmod(vx, wx);
    printf("%lf\n", mod);
    double div = (vx-mod)/wx;
    printf("%lf\n", div);
}

// gives:
// 0.4
// 19.000000

I would guess, that they chose this way of computing the floored division to keep the validity of (numerator//divisor)*divisor + fmod(numerator, divisor) = numerator (as mentioned in the link in @0x539's answer), even though this now results in a somewhat unexpected behavior of floor(8.0/0.4) != 8.0//0.4.

jotasi
  • 5,077
  • 2
  • 29
  • 51
  • 2
    You seem to be the only person with the correct answer. Props! Since you had to dig into the sources to find it, though, I wonder if this is a mandatory part of all Python implementations? – Kyle Strand Jul 26 '16 at 16:04
  • 2
    It does appear that as of [PEP 238](https://www.python.org/dev/peps/pep-0238/), it was expected that `floor(a/b) == a // b` would be true, since that is explicitly stated as the semantics for "floor-division". – Kyle Strand Jul 26 '16 at 16:32
  • 1
    In the issue report (https://bugs.python.org/issue27463) already referenced by @0x539, it doesn't seem to be considered as wrong. and this is the python bugtracker. So I guess "floor-division" is more of a name than being ment to define the implementation. – jotasi Jul 26 '16 at 17:05
  • Hm, the documentation linked in that bug report *does* state that `x == (x//y)*y + (x%y)` must be true; so it appears that the behavior you describe *is* mandatory. – Kyle Strand Jul 26 '16 at 17:12
  • @KyleStrand Yes exactly that's what mentioned in links in my answer and Guido has slightly pointed to them, but we want to find the exact part in source code that does this action. – Mazdak Jul 26 '16 at 17:25
  • 2
    "The result for the floored division is calculated by effectively computing `floor((numerator - mod) / denominator)`" - no, it's more like `round((numerator - mod) / denominator)`. The source code does use `floor`, but then it immediately adjusts the result upward if `floor` rounded the wrong way. It relies on the `- mod` part to "effectively floor" `numerator / denominator`. – user2357112 Jul 26 '16 at 17:47
  • 1
    @user2357112 You are right. Effectively, the result is rather `round`ed than just `floor`ed. Nontheless, the `-mod` causes the strange result. – jotasi Jul 26 '16 at 17:49
  • By the way, here's the [actually-official `float` source code](https://hg.python.org/cpython/file/3.5/Objects/floatobject.c), if you'd rather use that than the Github mirror. – user2357112 Jul 26 '16 at 17:51
  • @Kasramvd Actually, the part where is is done is refernced above. It is the function you pointed out earlier in your answer, but it does not happen because of the shift in the end, but due to a floating point error in c's `fmod` that gives `fmod(8.0, 0.4) == 0.4` and not `0.0` and thusly the effective `floor` done by using `numerator-mod` instead of `numerator` is off. – jotasi Jul 26 '16 at 17:53
  • @jotasi Yes, I just figured it out, and as user2357112 said It relies on the `mod` part to "effectively floor" `numerator / denominator`. – Mazdak Jul 26 '16 at 17:56
  • @Kasramvd Exactly. But even though it's rather clear now, what is happening and that this is supposed to be that way, I am still not sure, if it is the most intuitive implementation and if it shouldn't be better documented at least. – jotasi Jul 26 '16 at 17:58
  • @jotasi Yes, there's a good discussion here which will declare most of the problems http://python-history.blogspot.co.uk/2010/08/why-pythons-integer-division-floors.html – Mazdak Jul 26 '16 at 18:13
10

@jotasi explained the true reason behind it.

However if you want to prevent it, you can use decimal module which was basically designed to represent decimal floating point numbers exactly in contrast to binary floating point representation.

So in your case you could do something like:

>>> from decimal import *
>>> Decimal('8.0')//Decimal('0.4')
Decimal('20')

Reference: https://docs.python.org/2/library/decimal.html

shiva
  • 2,535
  • 2
  • 18
  • 32
9

Ok after a little bit of research I have found this issue. What seems to be happening is, that as @khelwood suggested 0.4 evaluates internally to 0.40000000000000002220, which when dividing 8.0 yields something slightly smaller than 20.0. The / operator then rounds to the nearest floating point number, which is 20.0, but the // operator immediately truncates the result, yielding 19.0.

This should be faster and I suppose its "close to the processor", but I it still isn't what the user wants / is expecting.

Kyle Strand
  • 15,941
  • 8
  • 72
  • 167
0x539
  • 1,489
  • 1
  • 13
  • 30
  • 7
    Good find, that. But what *would* a user want here? Correct mathematical behavior on numbers that are inherently incorrect to begin with? (Of which that same average 'typical user' is [usually blissfully unaware](http://stackoverflow.com/questions/14368893/why-is-decimal-multiplication-slightly-inaccurate).) – Jongware Jul 26 '16 at 12:18
  • 1
    @RadLexus A User wants the best possible approximation for this operation. In this case that is `20.0` – 0x539 Jul 26 '16 at 12:24
  • 5
    @0x539: What about the poor users who are relying on `//` to truncate things slightly less than `20.0` to `19.0`? The problem here is that the user wants to do exact arithmetic and is using the wrong tools for the job. –  Jul 26 '16 at 14:22
  • 1
    Actually, truncation is not what happens, at least if I understand the sources of cpython correctly. They go through a rather large ordeal to keep the identity mentioned in your link by actually computing `floor((8.0 - fmod(8.0, 0.4)) / 0.4)` and the error is introduced by `fmod(8.0, 0.4)=0.4`. (See my answer for the link and more explanation). – jotasi Jul 26 '16 at 15:44
  • 3
    Mathematically, you'd be correct that `8.0 / 0.40000000000000002220` "yields something slightly smaller than `20.0`." However, it's not correct to think of floating-point operations happening as a series of steps in which the *actual real mathematical value* is calculated, and *then* rounded (which you imply when you say "The `/` operator *then* rounds..."). Of course, this wouldn't be possible, since computers *must* have a way to internally represent all intermediate steps of a computation! See @jotasi 's answer. – Kyle Strand Jul 26 '16 at 16:26
7

That's because there is no 0.4 in python (floating-point finite representation) it's actually a float like 0.4000000000000001 which makes the floor of division to be 19.

>>> floor(8//0.4000000000000001)
19.0

But the true division (/) returns a reasonable approximation of the division result if the arguments are floats or complex. And that's why the result of 8.0/0.4 is 20. It actually depends on the size of arguments (in C double arguments). (not rounding to nearest float)

Read more about pythons integer division floors by Guido himself.

Also for complete information about the float numbers you can read this article https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

For those who have interest, the following function is the float_div that does the true division task for float numbers, in Cpython's source code:

float_div(PyObject *v, PyObject *w)
{
    double a,b;
    CONVERT_TO_DOUBLE(v, a);
    CONVERT_TO_DOUBLE(w, b);
    if (b == 0.0) {
        PyErr_SetString(PyExc_ZeroDivisionError,
                        "float division by zero");
        return NULL;
    }
    PyFPE_START_PROTECT("divide", return 0)
    a = a / b;
    PyFPE_END_PROTECT(a)
    return PyFloat_FromDouble(a);
}

Which the final result would be calculated by function PyFloat_FromDouble:

PyFloat_FromDouble(double fval)
{
    PyFloatObject *op = free_list;
    if (op != NULL) {
        free_list = (PyFloatObject *) Py_TYPE(op);
        numfree--;
    } else {
        op = (PyFloatObject*) PyObject_MALLOC(sizeof(PyFloatObject));
        if (!op)
            return PyErr_NoMemory();
    }
    /* Inline PyObject_New */
    (void)PyObject_INIT(op, &PyFloat_Type);
    op->ob_fval = fval;
    return (PyObject *) op;
}
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • @Kasramvd Thanks for the extensive answer. Maybe I am just dense but I don't get, what you mean with "snapping to next integral value". Obviously, not all floating point divisions will be rounded to next integral value (`3./4.` will not give `1`). Therefore, the decision for that couldn't be as simple as you present it, as far as I understand it. Did I understand you correctly? – jotasi Jul 26 '16 at 14:24
  • 1
    Actually, after checking the source code myself, I guess floating point division is done in the function `float_div`, whereas `float_divmod` is only called by `float_floor_div` which does the floor-division which in turn gives the "wrong" result `19` instead of `20`. – jotasi Jul 26 '16 at 14:46
  • @jotasi Yes, exactly. It's more complicated than a simple snapping. And yes it's `float_div` function which does the true dive task. It seems that it calculates the final result somehow based on arguments size. I updated the answer. thanks for your attention. – Mazdak Jul 26 '16 at 16:55
  • I double checked with simply checking the important lines in c and apparently the important part is that they compute `8.0/0.4 = 20` by simple `double` division in c whereas the floor division actually computes `floor((8.0 - fmod(8.0, 0.4)) / 0.4) = 19` because `fmod(8.0, 0.4) = 0.4` due to floating point arithmetic. See my answer below for more info. – jotasi Jul 26 '16 at 17:02
  • The `float` in python is double in `C` and what python does in this case es exactly what happen in C. – Mazdak Jul 26 '16 at 17:14
  • @Kasramvd What do you mean by your last comment? C doesn't have a "floor division" operator. And I don't really see what "argument size" has to do with anything. – Kyle Strand Jul 26 '16 at 17:15
  • @KyleStrand I didn't say anything about `floor division` in C It's all about true division. I'm referring to `op = (PyFloatObject*) PyObject_MALLOC(sizeof(PyFloatObject));` and as I said we need more searching in source code to find the exact reason, or maybe since it's a C's behavior we need to check its source too! – Mazdak Jul 26 '16 at 17:20
  • 3
    "the fact is that it depends on size of available `PyFloatObjects`" - what? No it doesn't. `PyFloatObject`s are all the same size, and the storage details of `PyFloatObject`s have pretty much nothing to do with any of this behavior. – user2357112 Jul 26 '16 at 17:41
  • @user2357112 Yeas it's not, I was reading the source code and realized that it's irrelevant to the result. – Mazdak Jul 26 '16 at 17:46
  • @Kasramvd ..... C doesn't have "source." Anyway, what I was asking about was your sentence "what python does in this case es exactly what happen in C." It's not clear what "case" you're referring to here; that's why I mentioned that C doesn't have "floor division", because the "case" we're talking about in this whole question is "floor division." – Kyle Strand Jul 26 '16 at 18:00
  • @KyleStrand By C's source I meant it's documentations or any source related to C :-) not souce code!!! (it's meaning less) And the case I was referring to was true division because of the wired result of `8.0/0.4`. – Mazdak Jul 26 '16 at 18:08