-1

I am using Python 3.8

Basically, given a set of x and y values, I wrote something that gives me an approximation using linear regression but for exponential models. I am wondering because the expected output of 2 1 is 100% accurate which I made to be on purpose. I chose powers of 2. The first version of the program gives me a like 99.99999% accurate answer, but the second one gives me a 100% accurate answer. Could anyone tell me why this is? Also, I am wondering if log2 is generally more accurate to use than log10, or if this is just a "fluke" due to the specific values I had in points Also, I looked at the "Similar questions" box and nothing came up that seemed similar to my question.

Version 1:

from math import log10, exp
from decimal import Decimal


points = {'1': '2', '2': '4', '5': '32', '6': '64'}
t = Decimal(str(len(points)))
p = Decimal('0')
q = Decimal('0')
r = Decimal('0')
s = Decimal('0')
for key, value in points.items():
    p += Decimal(key)
    q += Decimal(key) * Decimal(key)
    r += Decimal(log10(Decimal(value)))
    s += Decimal(key) * Decimal(log10(Decimal(value)))
a = (t * s - p * r) / (t * q - p * p)
b = (r - a * p) / t
a = Decimal(Decimal('10') ** a)
b = Decimal(Decimal('10') ** b)
print(a, b)

This is the result I got:

1.999999999999999960280230758 1.000000000000000060150187915

Version 2:

from math import log2
from decimal import Decimal


points = {'1': '2', '2': '4', '5': '32', '6': '64'}
t = Decimal(str(len(points)))
p = Decimal('0')
q = Decimal('0')
r = Decimal('0')
s = Decimal('0')
for key, value in points.items():
    p += Decimal(key)
    q += Decimal(key) * Decimal(key)
    r += Decimal(log2(Decimal(value)))
    s += Decimal(key) * Decimal(log2(Decimal(value)))
a = (t * s - p * r) / (t * q - p * p)
b = (r - a * p) / t
a = Decimal(Decimal('2') ** a)
b = Decimal(Decimal('2') ** b)
print(a, b)

This is the result I got:

2 1

I tried another test with values that are neither powers of 10 or 2, but I don't know which one is more accurate:

from math import log2, log10
from decimal import Decimal


points = {'1': '3', '2': '7', '5': '26', '6': '62'}
t = Decimal(str(len(points)))
p = Decimal('0')
q = Decimal('0')
r = Decimal('0')
s = Decimal('0')
for key, value in points.items():
    p += Decimal(key)
    q += Decimal(key) * Decimal(key)
    r += Decimal(log2(Decimal(value)))
    s += Decimal(key) * Decimal(log2(Decimal(value)))
a = (t * s - p * r) / (t * q - p * p)
b = (r - a * p) / t
a = Decimal(Decimal('2') ** a)
b = Decimal(Decimal('2') ** b)
print(a, b)


points = {'1': '3', '2': '7', '5': '26', '6': '62'}
t = Decimal(str(len(points)))
p = Decimal('0')
q = Decimal('0')
r = Decimal('0')
s = Decimal('0')
for key, value in points.items():
    p += Decimal(key)
    q += Decimal(key) * Decimal(key)
    r += Decimal(log10(Decimal(value)))
    s += Decimal(key) * Decimal(log10(Decimal(value)))
a = (t * s - p * r) / (t * q - p * p)
b = (r - a * p) / t
a = Decimal(Decimal('10') ** a)
b = Decimal(Decimal('10') ** b)
print(a, b)

My result is:

1.752690522119211309696792535 1.902951630457180834208048103
1.752690522119211322362489659 1.902951630457180751134992738
  • 4
    All your values are powers of 2, so `log2(value)` will be an integer. But `log10(value)` is irrational, so you get an approximation. – Barmar Jan 29 '20 at 22:19
  • You are destroying the precision of your `Decimal`s by using functions from the `math` module on them - which necessarily turns them into `float`s. Use the corresponding methods of your `Decimal` objects instead. – jasonharper Jan 29 '20 at 22:38
  • @jasonharper Even if he uses `decimal.log10()`, it's still going to introduce errors, since there's no exact value of this. – Barmar Jan 29 '20 at 22:44

2 Answers2

0

When you calculate the logarithm of a number, you'll generally get an irrational result unless the number is an power of the base of the logarithm. Irrational numbers can't be represented precisely in decimal (or any other base), so you get an approximation. And as you perform more calculations, the errors propagate.

I don't think there's any way to know which of the two results in the last test is more accurate. They're both the same in the first 16 digits, which is the precision of double precision floating point. Even though you're using the decimal library, it may use floating point internally to calculate transcendental functions. I suspect they only differ in the last 1-2 bits of the binary result, so they're really close. But errors can get magnified as you do more calculations with the results.

If you need accuracy, avoid using transcendental functions.

See also Is floating point math broken?

Barmar
  • 741,623
  • 53
  • 500
  • 612
0

I'm not aware of any mathematical reason why log2 of an arbitrary number (integer or not) would be more likely to result in an irrational number - and therefore loss of precision - than log10 or any other log. Unfortunately there will always be some loss of precision unless the results of each calculation happen to be rational (powers of the log base).

You could check the loss of precision associated with taking logarithms for a specific series of data like this:

def test_precisions(iterable):
    precision_tests = {  # take the log and reverse, several ways
        "ln": lambda n: n.ln().exp(),
        "log10": lambda n: Decimal(10) ** n.log10(),
        "log2": lambda n: Decimal(2) ** (n.ln() / Decimal(2).ln()),
        "math.log2": lambda n: Decimal(2) ** Decimal(log2(n))
        }
    L = list(iterable)
    print('\nTesting {}, {}, {}...{}'.format(L[0], L[1], L[2], L[-1]))

    # Report average loss of precision from each log and reverse function
    # over the numbers in the range
    for name, test in precision_tests.items():
        print(name, sum(test(n) - n for n in L) / len(L))

def dec_range(start, stop, step=1):
    """ Like `range` but yields decimals """
    start = Decimal(start)
    stop = Decimal(stop)
    step = Decimal(step)
    steps = int((stop - start) / step)
    for i in range(steps):
        yield start + i * step

test_precisions(dec_range(1, 1000, 10))  # ln seems best
test_precisions(dec_range(1, 100))       # ln seems best
test_precisions(dec_range(1, 100, '.1')) # log10 seems best
test_precisions(dec_range(1, 100, Decimal(1)/7))  # log10 seems best

For these ranges there doesn't seem to be much difference between the Decimal methods ln and log10, and both generally do a lot better than math.log2 or Decimal(x).ln / Decimal(2).ln.

Stuart
  • 9,597
  • 1
  • 21
  • 30