Working with very large numbers in python

Question

I subtract large numbers using awk which works fine (subtract Time_res-Time_req to find lag), but unable to do the same in python.

    ID             Time_req             Time_res     lag  
0   3000002  1455594303468741117  1455594303469326836  585728

Why is below output so ?

>>> 1455594303469326836 - 1455594303468741117
585719
and not 585728

I even tried

>>> long(1455594303469326836) - long(1455594303468741117)
585719L #still wrong

585719 is the correct answer. awk is wrong, python is right. — juanpa.arrivillaga, Mar 08 '17 at 08:19
More information about working with large numbers: http://stackoverflow.com/questions/538551/handling-very-large-numbers-in-python — Shaig Khaligli, Mar 08 '17 at 08:21
Just realised this, assumed awk was correct because google gives the same answer as awk — pythonRcpp, Mar 08 '17 at 08:50
you can't get that many significant digits in `awk`, Google might be using the same precision (I guess 64 bits) — karakfa, Mar 08 '17 at 14:53
You can get that many significant digits in awk but you need `gawk -M` to do it. See https://www.gnu.org/software/gawk/manual/gawk.html#MPFR-features. — Ed Morton, Mar 08 '17 at 16:57

score 1 · Answer 1 · answered Mar 08 '17 at 16:57

To do large number (arbitrary precision) math in awk you need gawk -M:

$ awk 'BEGIN{print 1455594303469326836 - 1455594303468741117}'
585728

$ awk -M 'BEGIN{print 1455594303469326836 - 1455594303468741117}'
585719

$ awk --version | head -2
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)
Copyright (C) 1989, 1991-2016 Free Software Foundation.

See https://www.gnu.org/software/gawk/manual/gawk.html#MPFR-features for more details.

dawg · Answer 2 · 2017-03-09T02:55:54.810

For gawk you need to specify you want to use the bignum package (if gawk was compiled with a link to that package):

$ gawk -M 'BEGIN{print 1455594303469326836 - 1455594303468741117}'
585719

(You can also do gawk --bignum 'prog' for the same function as gawk -M)

Without the -M switch, you can see the overflow take place by converting the input to integers by adding 0 to the string. Note the second column is not the same as the first:

$ echo "1455594303469326836 1455594303468741117" | awk  '{print $1 " => " $1+0,ORS $2 " => " $2+0}'
1455594303469326836 => 1455594303469326848 
1455594303468741117 => 1455594303468741120

vs

$ echo "1455594303469326836 1455594303468741117" | awk -M '{print $1 " => " $1+0,ORS $2 " => " $2+0}'
1455594303469326836 => 1455594303469326836 
1455594303468741117 => 1455594303468741117

Since IEEE 754 doubles have 53 bits of precision for the mantissa (usable for an exact integer representation up to that size) they start to loose exact representation ability for an integer with more than 53 bits of size:

$ awk 'BEGIN{print 2**53, 2**53+1}'
9007199254740992 9007199254740992
               ^                ^       not +1 in least significant digit

$ awk -M 'BEGIN{print 2**53, 2**53+1}'
9007199254740992 9007199254740993       
               ^                ^       fixed...

Your input requires 61 bits to represent exactly (or 62 bits with the sign bit), so you loose the ability to represent the least significant digits of the input.

Options

If you do not have gawk with the bignum option, you can use perl with BigNum:

$ perl -Mbignum -E 'say 1455594303469326836 - 1455594303468741117'
585719

python:

$ python -c 'print 1455594303469326836 - 1455594303468741117'
585719

bc:

$ echo "1455594303469326836 - 1455594303468741117" | bc
585719

ruby:

$ ruby -e "puts 1455594303469326836 - 1455594303468741117"
585719

But basic POSIX awk -- no bueno for arbitrarily precision integer or non IEEE 754 floating point math. All arithmetic in POSIX awk (or gawk without bignum) is done with IEEE double precision which overflows with the size input you have.

Further to your point, the OP's integers require 62 bits, and IEEE 754 supply only 53 bits of precision. Where is it written, though, that awk uses double precision? The awk man pages I have for OS X and NetBSD don't mention that. — James K. Lowden, Mar 08 '17 at 20:30
I am mostly relying on GNU documentation: [POSIX awk uses double-precision floating-point numbers](https://www.gnu.org/software/gawk/manual/html_node/Computer-Arithmetic.html) and [By default, gawk uses the double-precision floating-point values supplied by the hardware of the system it runs on.](https://www.gnu.org/software/gawk/manual/gawk.html#MPFR-features) — dawg, Mar 08 '17 at 20:39
But OpenGroup has [A numeric value that is exactly equal to the value of an integer (see Concepts Derived from the ISO C Standard) shall be converted to a string by the equivalent of a call to the sprintf function (see String Functions) with the string "%d"](http://pubs.opengroup.org/onlinepubs/9699919799/) — dawg, Mar 08 '17 at 20:39
Yup, that's what I'm seeing. `awk '{printf "%d - %d\n", 1455594303469326836, 1455594303468741117}'` produces `1455594303469326848 - 1455594303468741120`. Give that to **expr**(1), and you get `585719`. — James K. Lowden, Mar 08 '17 at 20:49

Working with very large numbers in python

2 Answers2