gmpy2 log2 not accurate after 16 digits

Question

When using log2() in gmpy2 it does not seem to be accurate after 16 digits. It seems to work fine at 15 digits but after that the answer is not correct using mpz(mpfr(2) ** mpfr(x)). Do I need change the precision? I thought python by itself would be accurate up to 53 digits.

Additionally, is there a way in gmpy2 to use a logarithm operation in bases besides 10 and 2? For example, base 8 or 16.

casevh · Accepted Answer · 2015-10-13T05:09:27.187

The standard Python float type is accurate to 53 bits which is roughly 16 decimal digits. gmpy2 uses a default precision of 53 bits. If you want more accurate results, you will need to increase the precision.

>>> import gmpy2
>>> from gmpy2 import mpz,mpfr,log2
>>> a=12345678901234567890
>>> gmpy2.get_context().precision=70
>>> mpz(2**log2(a))
mpz(12345678901234567890L)

To calculate a logarithm in a different, just use

>>> gmpy2.log(x)/gmpy2.log(base)

Update

Recovering an exact integer result from a sequence of floating point calculations is generally not possible. Depending on the actual calculations, you can increase the precision until you get "close enough".

Let's look at the impact of precision. Note that a is 57 bits long so it cannot be exactly represented with 53 bits of floating point precision.

>>> a=123543221556677776
>>> a.bit_length()
57
>>> gmpy2.get_context().precision=53
>>> mpfr(a);2**log2(a)
mpfr('1.2354322155667778e+17')
mpfr('1.2354322155667752e+17')

Since conversion of a binary floating point number to decimal can introduce a conversion error, lets look at the results in binary.

>>> mpfr(a).digits(2);(2**log2(a)).digits(2)
('11011011011101001111001111100101101001011000011001001', 57, 53)
('11011011011101001111001111100101101001011000010111001', 57, 53)

Let's trying increasing the precision to 57 bits.

>>> gmpy2.get_context().precision=57
>>> mpfr(a).digits(2);(2**log2(a)).digits(2)
('110110110111010011110011111001011010010110000110010010000', 57, 57)
('110110110111010011110011111001011010010110000110010011000', 57, 57)

Notice more bits are correct but there is still an error. Let's try 64 bits.

>>> gmpy2.get_context().precision=64
>>> mpfr(a);2**log2(a)
mpfr('123543221556677776.0',64)
mpfr('123543221556677775.953',64)
>>> mpfr(a).digits(2);(2**log2(a)).digits(2)
('1101101101110100111100111110010110100101100001100100100000000000', 57, 64)
('1101101101110100111100111110010110100101100001100100011111111010', 57, 64)

The large number of trailing 1's is roughly equivalent to trailing 9's in decimal.

Once you get "close enough", you can convert to an integer which will round the result to the expected value.

Why isn't 57 bits sufficient? The MPFR library that is used by gmpy2 does perform correct rounding. There is still a small error. Let's also look at the results using the floating point values immediately above and below the correctly rounded value.

>>> gmpy2.get_context().precision=57
>>> b=log2(a)
>>> 2**gmpy2.next_below(b);2**log2(a);2**gmpy2.next_above(b)
mpfr('123543221556677746.0',57)
mpfr('123543221556677784.0',57)
mpfr('123543221556677822.0',57)

Notice that even a small change in b causes a much larger change in 2**b.

Update 2

Floating point arithmetic is only an approximation to the mathematical properties of real numbers. Some numbers are rational (they can be written as a fraction) but most numbers are irrational (they can never be written exactly as a fraction). Floating point arithmetic actually uses a rational approximation to a number.

I've skipped some of the details in the following - I assume all numbers are between 0 and 1.

With binary floating point (what most computers use), the denominator of the rational approximation must be a power of 2. Numbers like 1/2 or 1/4 can be represented exactly. Decimal floating point uses rational approximations that have a denominator that is a power of 10. Numbers like 1/2, '1/4', '1/5', and 1/20 can all be represented exactly. Neither can represent 1/3 exactly. A base-6 implementation of floating point arithmetic can represent 1/2 and 1/3 exactly but not 1/10. The precision of a particular format just specifies the maximum size of the numerator. There will always be some rational numbers that cannot be represented exactly by a given base.

Since irrational numbers can't be written as a rational number, they can not be represented exactly by a given base. Since logarithm and exponential functions almost always result in irrational values, the calculations are almost never exact. By increasing the precision, you can usually get "close enough" but you can never get exact.

There are programs that work symbolically - they remember that a is log2(n) and when you do 2**a, the exact value of a is returned. See SymPy.

I raised the precision to 100 but the output does not seem to work unless I use the code you provided. Example log2 (123543221556677776) ends up with mpfr ('56.777793469605039610459389190493', 100) if I try using that output it does not match the original. Using your code with the same number it works. Does the precision have to match the exact number of bits required to solve the equation? — python_nube, Oct 13 '15 at 00:39
I don't understand what you by "using that output". Can you show the exact commands you are trying? — casevh, Oct 13 '15 at 02:42
Log2 (x), I copy and paste the output into mpz (mpfr (2)**mpfr (y)) . x is a number larger than 16 digits and y is what I copy and paste from the output. Im try to use this for math homework where I can just use the output from log2 () to be correct. It seems only correct in memory not in output in the python interpreter. — python_nube, Oct 13 '15 at 03:29
How accurate is the provided value of `log2(x)`? You need to consistent precision throughout the calculation and increase the precision if needed. I edited my answer to include more examples. — casevh, Oct 13 '15 at 03:42
I really apologize for dwelling on this. Is it not possible get an exact answer? Is this a math problem or a library issue? I am not really good with math but I have been researching different number bases and arguments towards different number bases that would solve this problem like dozenal or base 6 instead of base 10. I really like base 16 considering its how processors work by default. I want to answer my homework without writing something like ..33333333. Am I making any sense? — python_nube, Oct 13 '15 at 04:27
I've expanded my answer but basically that's how floating point math works. Also see http://stackoverflow.com/questions/588004/is-floating-point-math-broken — casevh, Oct 13 '15 at 05:10

gmpy2 log2 not accurate after 16 digits

1 Answers1

Linked