Tensorflow and Python: Why two same operations have small error in output

Question

I was working on tf.matmul op with multidimensional input arguments. For example given in code below,

import tensorflow as tf
import numpy as np

nn_arch = [100,1200]
batch_size = 128

n_ = np.random.normal(size=[batch_size,nn_arch[1],nn_arch[0]])
w_ = np.random.normal(size=[nn_arch[1],nn_arch[0]])
x_ = np.random.normal(size=[batch_size,nn_arch[0],1])

n = tf.constant(n_)
w = tf.constant(w_)
x = tf.constant(x_)
nw = n + w
c = tf.matmul(nw,x)
sess = tf.InteractiveSession()

c_ = np.zeros(shape=[batch_size,nn_arch[1],1])
for i in range(batch_size):
    c_[i,:] = (n_[i,:,:] + w_).dot(x_[i,:,:])

print(sess.run(c)-c_)
print(np.max(np.abs(sess.run(c)-c_)))

This function will print the last line with value of the order of 1e-15. Why is there a this small amount of difference between two ops, that are basically doing same calculations at same precision float64?

"Basically the same calculations" with floating point usually means "the same up to rounding error". A single-bit rounding error on largish values could easily be on the order of 1e-15. Or maybe you had a single bit rounding error on smallish values, but then did a sqrt somewhere (like, say, inside `np.dot`), raising it from 1e-30 to 1e-15. — abarnert, Apr 27 '18 at 04:17
Meanwhile, assuming your actual values are somewhere around the order of 1, and you don't have 16 digits' worth of actual precision… Well, you may need to use `isclose` instead of `==` somewhere (but then you almost always want to do that anyway), and you may need to specify the format for human-readable output instead of defaulting it (but then you usually want to do that anyway), but you don't have an actual problem. — abarnert, Apr 27 '18 at 04:19
@abarnert thanks a lot!! So, it is basically rounding off error that I see and ideally both are exactly same operations. But if op c and c_ are happening on the same platform (software, hardware, prog language) shouldn't the two have same rounding error and operations results be identical? — Vinay Joshi, Apr 27 '18 at 14:00
But are they exactly the same operations? Did you verify (by reading the source, or tracing though it) that the order of steps is identical between the two? — abarnert, Apr 27 '18 at 15:09
Yes, order of steps is identical, verified through source and when I use astype(np.int32) (for n_,w_,x_) and standard deviation large enough like 100 or 1000, then last print returns 0 even for different seeds, so this also confirms that both c and c_ are doing exactly the same calculations. — Vinay Joshi, Apr 27 '18 at 16:17
What do you expect testing on ints to verify? Think of a simple example: `(a+b)+c` and `a+(b+c)` are obviously equivalent for actual integers or reals. They’re also equivalent for limited-size machine integers as long as there’s no overflow. But for limited-size machine floats (even if there’s no overflow or underflow) they may not be. — abarnert, Apr 27 '18 at 16:25
I meant these ops are mathematically same on paper but for machine yes they can have small difference as the calculations grow in size. Okay, so conclusion is, observed error is due to quantization of real numbers on digital computers. — Vinay Joshi, Apr 27 '18 at 16:39
Essentially, yes. Addition on reals is associative on paper: IEEE addition on IEEE binary64 floats is not associative even on paper. You can write down the bit patterns closest to your numbers, and work out the defined algorithms, and get the exact value your computer gives—in contrast to the quantization noise of analog computers, which can only be treated as random noise. But in _most_ cases, the distinction doesn’t matter: differences way below your desired precision can be treated as zero, so if you make sure the errors stay in that range, you can ignore them. — abarnert, Apr 27 '18 at 16:50
If you want to know more, read [this article](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html), and I can find a duplicate question with a more accessible answer somewhere on SO. — abarnert, Apr 27 '18 at 16:52
Thanks!! "IEEE addition on IEEE binary64 floats is not associative even on paper." This explains it all. — Vinay Joshi, Apr 27 '18 at 17:22

Tensorflow and Python: Why two same operations have small error in output

0 Answers0