5

Consider this (all commands run on an 64bit Arch Linux system):

  • Perl (v5.24.0)

    $ perl -le 'print 10190150730169267102/1000%10'
    6
    
  • awk (GNU Awk 4.1.3)

    $ awk 'BEGIN{print 10190150730169267102/1000%10}'
    6
    
  • R (3.3.1)

    > (10190150730169267102/1000)%%10
    [1] 6
    
  • bc

    $ echo 10190150730169267102/1000%10 | bc
    7
    
  • Python 2 (2.7.12)

    >>> print(10190150730169267102/1000%10)
    7
    
  • Python 3 (3.5.2)

    >>> print(10190150730169267102/1000%10)
    8.0
    

So, Perl, gawk and R agree, as do bc and Pyhon 2. Nevertheless, between the 6 tools tested, I got 4 different results. I understand that this has something to do with how very long integers are being rounded, but why do the different tools differ quite so much? I had expected that this would depend on the processor's ability to deal with large numbers, but it seems to depend on internal features (or bugs) of the language.

Could someone explain what is going on behind the scenes here? What are the limitations in each language and why do they behave quite so differently?

terdon
  • 3,260
  • 5
  • 33
  • 57
  • Overflow and aproximations in short terms. – John Doe Aug 23 '16 at 11:56
  • @JohnDoe OK, but why such a huge difference? – terdon Aug 23 '16 at 12:01
  • In R it's clear, I guess it's because `%%` has a higher precedence over `/`. In Python, `10190150730169267102/1000` is a huge number, I guess it's truncated or something.. – Maroun Aug 23 '16 at 12:05
  • @MarounMaroun good point about R, thanks. Question edited. And yes, I know it has something to do with huge numbers, but I am surprised to see such great differences between the different tools. That's what I'm trying to understand. – terdon Aug 23 '16 at 12:08
  • I think most of the difference has to do with how `%` behaves when one side is a floating-point number and the other is an integer. If you just calculate `10190150730169267102/1000`, there should be less disagreement. – jwodder Aug 23 '16 at 12:22
  • @terdon There are known issues on Perl (look at this question, maybe you find the asnwer for Perl: [link](http://stackoverflow.com/a/24493650/6594367) – John Doe Aug 23 '16 at 12:41
  • 4
    Perl, awk and R seem to convert to a double precision floating point number for the division, and the closest `double` to `10190150730169267102` is `10190150730169266176`, which explains the `6`. – Daniel Fischer Aug 23 '16 at 12:42
  • 3
    Re "*why such a huge difference*", It's a difference of 0.0000000000000001 of the original number. Hardly huge! That's 16 digits of precision! – ikegami Aug 23 '16 at 14:31
  • Add `-Mbigint` to the perl command line to get this done via integers instead of floating point there. – Tanktalus Aug 23 '16 at 15:05
  • @Tanktalus that what I had assumed too but I tested this (just didn't mention in the question) and I get the same result. `perl -MMath::BigInt -le 'print 10190150730169267102/1000%10; '` also prints 6. – terdon Aug 23 '16 at 15:10
  • @ikegami the huge difference is between the results. I see 6,7 and 8. That's a pretty significant difference. I assume it rises from a small rounding error but I am still surprised to see a range of 3 integer results. – terdon Aug 23 '16 at 15:11
  • 1
    @terdon - no, use `-Mbigint` not `-MMath::BigInt` and you'll get 7. What you did will only load the M::BI module, but won't convert all numbers to M::BI objects by default, only `bigint` does that. – Tanktalus Aug 23 '16 at 15:19
  • 1
    @terdon, No, a difference in the 17th significant digit is the very opposite of "pretty significant". – ikegami Aug 23 '16 at 15:22
  • @ikegami indeed, no argument there. It's just that that small difference leads to a very significant one in the final output. And I still don't get how such a tiny difference can lead to both 6,7 and 8 as a result. Does Perl round down while Python rounds up or something? – terdon Aug 23 '16 at 15:28
  • @Tanktalus ah, I see now. Yes indeed, thanks. – terdon Aug 23 '16 at 15:28
  • 2
    @terdon, That doesn't matter. The lesson you should be learning is that double-precision numbers are not appropriate for your solution. – ikegami Aug 23 '16 at 15:31
  • @Tanktalus `perl -Mbigint -le 'print 10190150730169267102/10000%10'` prints `6` here (v5.18.1). Did the behaviour change or is it a bug? – Daniel Fischer Aug 23 '16 at 15:32
  • @DanielFischer - bug. As in, the original question was `/1e3` not `/1e4` so your code has a bug :) – Tanktalus Aug 23 '16 at 15:41
  • @Tanktalus D'oh, copied the wrong line in the terminal to add `-Mbigint`. Indeed it prints 7 after the fix. Thanks. – Daniel Fischer Aug 23 '16 at 15:42
  • @ikegami heh, and that's a lesson well learnt. I'm still trying to understand why the tools behave so differently though. I hear what you're saying and get that the original difference is small, but I'm still trying to understand how that tiny difference can result in 3 separate integer results. – terdon Aug 23 '16 at 15:44

3 Answers3

9

You're seeing different results for two reasons:

  1. The division step is doing two different things: in some of the languages you tried, it represents integer division, which discards the fractional part of the result and just keeps the integer part. In others it represents actual mathematical division (which following Python's terminology I'll call "true division" below), returning a floating-point result close to the true quotient.

  2. In some languages (those with support for arbitrary precision), the large numerator value 10190150730169267102 is being represented exactly; in others, it's replaced by the nearest representable floating-point value.

The different combinations of the possibilities in 1. and 2. above give you the different results.

In detail: in Perl, awk, and R, we're working with floating-point values and true division. The value 10190150730169267102 is too large to store in a machine integer, so it's stored in the usual IEEE 754 binary64 floating-point format. That format can't represent that particular value exactly, so what gets stored is the closest value that is representable in that format, which is 10190150730169266176.0. Now we divide that approximation by 1000, again giving a floating-point result. The exact quotient, 10190150730169266.176, is again not exactly representable in the binary64 format, and we get the closest representable float, which happens to be 10190150730169266.0. Taking a remainder modulo 10 gives 6.

In bc and Python 2, we're working with arbitrary-precision integers and integer division. Both those languages can represent the numerator exactly. The division result is then 10190150730169267 (we're doing integer division, not true division, so the fractional part is discarded), and the remainder modulo 10 is 7. (This is oversimplifying a bit: the format that bc is using internally is somewhat closer to Python's Decimal type than to an arbitrary-precision integer type, but in this case the effect is the same.)

In Python 3, we're working with arbitrary-precision integers and true division. The numerator is represented exactly, but the result of the division is the nearest floating-point value to the true quotient. In this case the exact quotient is 10190150730169267.102, and the closest representable floating-point value is 10190150730169268.0. Taking the remainder of that value modulo 10 gives 8.

Summary:

  • Perl, awk, R: floating-point approximations, true division
  • bc, Python 2: arbitrary-precision integers, integer division
  • Python 3: arbitrary-precision integers, true division
Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120
3

I can answer only for the difference between python 2 and python 3. "/" is integer division in python 2 while it is real division in python 3 (that's where the .0 in python 3 comes from. The output is floating point.

To summarize:

  • Python 2

    10190150730169267102/1000%10 
    

    equals

    10190150730169267%10
    

    equals

    7
    
  • Python 3

    10190150730169267102/1000%10 
    

    equals

    10190150730169267,102%10
    

    equals

    7.102 
    

but because of internal representation it's (wrongly) computed to 8.0

You may note that the correct answer may be 7 or 7.102 depending if we consider the division to be floating point or integer. So only Python(2) and bc have correct answers. And python 3 would have correct answer with integer division (10190150730169267102//1000%10).

Python supports arbitrarily large integers natively!

terdon
  • 3,260
  • 5
  • 33
  • 57
Julien
  • 1,810
  • 1
  • 16
  • 34
1

in perl6

➜  ~  perl6 -e 'say(10190150730169267102 div 1000 mod 10)'
7
➜  ~  perl6 -e 'say(10190150730169267102/1000%10)'
7.102

so, If you are not sure which language is correct, try to ask Perl6. :)

echo
  • 2,666
  • 1
  • 25
  • 17