1

I subtract large numbers using awk which works fine (subtract Time_res-Time_req to find lag), but unable to do the same in python.

    ID             Time_req             Time_res     lag  
0   3000002  1455594303468741117  1455594303469326836  585728

Why is below output so ?

>>> 1455594303469326836 - 1455594303468741117
585719
and not 585728

I even tried

>>> long(1455594303469326836) - long(1455594303468741117)
585719L #still wrong
pythonRcpp
  • 2,042
  • 6
  • 26
  • 48

2 Answers2

1

To do large number (arbitrary precision) math in awk you need gawk -M:

$ awk 'BEGIN{print 1455594303469326836 - 1455594303468741117}'
585728

$ awk -M 'BEGIN{print 1455594303469326836 - 1455594303468741117}'
585719

$ awk --version | head -2
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)
Copyright (C) 1989, 1991-2016 Free Software Foundation.

See https://www.gnu.org/software/gawk/manual/gawk.html#MPFR-features for more details.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

For gawk you need to specify you want to use the bignum package (if gawk was compiled with a link to that package):

$ gawk -M 'BEGIN{print 1455594303469326836 - 1455594303468741117}'
585719

(You can also do gawk --bignum 'prog' for the same function as gawk -M)

Without the -M switch, you can see the overflow take place by converting the input to integers by adding 0 to the string. Note the second column is not the same as the first:

$ echo "1455594303469326836 1455594303468741117" | awk  '{print $1 " => " $1+0,ORS $2 " => " $2+0}'
1455594303469326836 => 1455594303469326848 
1455594303468741117 => 1455594303468741120

vs

$ echo "1455594303469326836 1455594303468741117" | awk -M '{print $1 " => " $1+0,ORS $2 " => " $2+0}'
1455594303469326836 => 1455594303469326836 
1455594303468741117 => 1455594303468741117

Since IEEE 754 doubles have 53 bits of precision for the mantissa (usable for an exact integer representation up to that size) they start to loose exact representation ability for an integer with more than 53 bits of size:

$ awk 'BEGIN{print 2**53, 2**53+1}'
9007199254740992 9007199254740992
               ^                ^       not +1 in least significant digit

$ awk -M 'BEGIN{print 2**53, 2**53+1}'
9007199254740992 9007199254740993       
               ^                ^       fixed...

Your input requires 61 bits to represent exactly (or 62 bits with the sign bit), so you loose the ability to represent the least significant digits of the input.


Options

If you do not have gawk with the bignum option, you can use perl with BigNum:

$ perl -Mbignum -E 'say 1455594303469326836 - 1455594303468741117'
585719

python:

$ python -c 'print 1455594303469326836 - 1455594303468741117'
585719

bc:

$ echo "1455594303469326836 - 1455594303468741117" | bc
585719

ruby:

$ ruby -e "puts 1455594303469326836 - 1455594303468741117"
585719

But basic POSIX awk -- no bueno for arbitrarily precision integer or non IEEE 754 floating point math. All arithmetic in POSIX awk (or gawk without bignum) is done with IEEE double precision which overflows with the size input you have.

dawg
  • 98,345
  • 23
  • 131
  • 206
  • 1
    Further to your point, the OP's integers require 62 bits, and IEEE 754 supply only 53 bits of precision. Where is it written, though, that awk uses double precision? The awk man pages I have for OS X and NetBSD don't mention that. – James K. Lowden Mar 08 '17 at 20:30
  • I am mostly relying on GNU documentation: [POSIX awk uses double-precision floating-point numbers](https://www.gnu.org/software/gawk/manual/html_node/Computer-Arithmetic.html) and [By default, gawk uses the double-precision floating-point values supplied by the hardware of the system it runs on.](https://www.gnu.org/software/gawk/manual/gawk.html#MPFR-features) – dawg Mar 08 '17 at 20:39
  • But OpenGroup has [A numeric value that is exactly equal to the value of an integer (see Concepts Derived from the ISO C Standard) shall be converted to a string by the equivalent of a call to the sprintf function (see String Functions) with the string "%d"](http://pubs.opengroup.org/onlinepubs/9699919799/) – dawg Mar 08 '17 at 20:39
  • 1
    Yup, that's what I'm seeing. `awk '{printf "%d - %d\n", 1455594303469326836, 1455594303468741117}'` produces `1455594303469326848 - 1455594303468741120`. Give that to **expr**(1), and you get `585719`. – James K. Lowden Mar 08 '17 at 20:49