5

I am running experiments on synthetic data (e.g. fitting a sine curve) and I get errors in pytorch that are really small. One if about 2.00e-7. I was reading about machine precision and it seems really close to the machine precision. How do I know if this is going to cause problems (or if perhaps it already has e.g. I can't differentiate between the different errors since they are "machine zero").

errors:

p = np.array([2.3078539778125768e-07,
               1.9997889411762922e-07,
               2.729681222011256e-07,
               3.2532371115080884e-07])

m = np.array([3.309504692539563e-07,
                 4.1058904888091606e-06,
                 6.8326703386053605e-06,
                 7.4616147721799645e-06])

what confuses me is that I tried adding what I thought was to small of a number so that it returned no difference but it did return a difference (i.e. I tried to do a+eps = a using eps = smaller than machine precision):

import torch

x1 = torch.tensor(1e-6)
x2 = torch.tensor(1e-7)
x3 = torch.tensor(1e-8)
x4 = torch.tensor(1e-9)

eps = torch.tensor(1e-11)

print(x1.dtype)
print(x1)
print(x1+eps)

print(x2)
print(x2+eps)

print(x3)
print(x3+eps)

print(x4)
print(x4+eps)

output:

torch.float32
tensor(1.0000e-06)
tensor(1.0000e-06)
tensor(1.0000e-07)
tensor(1.0001e-07)
tensor(1.0000e-08)
tensor(1.0010e-08)
tensor(1.0000e-09)
tensor(1.0100e-09)

I expected everything to be zero but it wasn't. Can someone explain to me what is going on? If I am getting losses close to 1e-7 should I use double rather than float? googling it seems that single is the precision for float afaik.

If I want to use doubles what are cons/pros + what is the least error prone way to change my code? Is a single change to double type enough or is there a global flag?


Useful reminder:

recall machine precision:

Machine precision is the smallest number ε such that the difference between 1 and 1 + ε is nonzero, i.e., it is the smallest difference between these two numbers that the computer recognizes. For IEEE-754 single precision this is 2-23 (approximately 10-7) while for IEEE-754 double precision it is 2-52 (approximately 10-16) .


Potential solution:

Ok let’s see if this is a good summary of what I think is correct (modulo ignoring some details that I don’t fully understand right now of floats, like the bias).

But I’ve concluded that the best thing for me is to make sure my errors/numbers have two properties:

they are within 7decimals of each other (due to the mantissa being 24 bigs like you pointed out the log_10(2^24) = 7.225) they are far enough from the edges. For this I take the mantissa to be 23 bits away from the lower edge (point position about -128+23) and the same for the largest edge but 127-23. As long we satisfy that more or less we avoid adding two numbers that are too small for the machine to distinguish (condition 1) and avoid overflows/underflows (condition 2).

Perhaps there is a small detail I might be missing with the bias or some other float detail (like representing infinity, NaN). But I believe that is correct.

If anyone can correct the details, that would be fantastic.


useful links:

Charlie Parker
  • 5,884
  • 57
  • 198
  • 323

1 Answers1

6

I think you misunderstood how floating points work. There are many good resources (e.g.) about what floating points are, so I am not going into details here.

The key is that floating points are dynamic. They can represent the addition of very large values up to a certain accuracy, or the addition of very small values up to a certain accuracy, but not the addition of a very large value with a very small value. They adjust their ranges on-the-go.

So this is why your testing result is different than the explanation in "machine precision" -- you are adding two very small values, but that paragraph explicitly said "1+eps". 1 is a much larger value than 1e-6. The following thus will work as expected:

import torch

x1 = torch.tensor(1).float()
eps = torch.tensor(1e-11)

print(x1.dtype)
print(x1)
print(x1+eps)

Output:

torch.float32
tensor(1.)
tensor(1.)

The second question -- when should you use double?

Pros - higher accuracy.

Cons - Much slower (hardware are configured to like float most of the time), doubled memory usage.

That really depends on your application. Most of the time I would just say no. As I said, you need double when you have very large values and very small values coexist in the network. That should not be happening anyway with proper normalization of data.

(Another reason is the overflow of exponent, say when you need to represent very very very large/small values, beyond 1e-38 and 1e38)

hkchengrex
  • 4,361
  • 23
  • 33
  • thanks for sharing the link. Apologies for not getting this but what I'm confused (can't) figure out from the link you sent why floating points are dynamic. I would have expected the bias to play a role in this but I can't figure out how. If we always a fixed bias I would have expected that we are limited by the bit representation of the bias so we are always forced to be in the -127 to 127 range? What am I missing? – Charlie Parker Sep 15 '20 at 16:57
  • @CharlieParker No, there isn't a bias. Let's just use the `1e-6` that is already in your question -- a floating-point-like representation would store two parts, the number before `e` (1) and the number after `e` (-6). Both of these numbers are fixed range. Together they can represent a very larger range of numbers. `1e-6+1e-6` works because we are only adding the number before `e`. `1e-0+1e-11` does not work because the number after `e` will remain as `0`, meaning the number before e needs to be `1.000....1` which cannot be represented in its fixed range. – hkchengrex Sep 15 '20 at 17:09
  • I think I understand that there are two numbers we store, the mantissa (the part around the dot/before the `e`) and the exponent (the `2^E` or the number after the `e`). But we always have a fixed number of numbers we can store since the representation is always fixed. I think one part that is confusing me is calling it "dynamic". With fixed stuff I cannot represent any number e.g.`1e-10000000000000000000000`. Somehow dynamic made me think we only needed to worry about the relative values but that's not fully true. – Charlie Parker Sep 15 '20 at 17:19
  • What I'm inferring from your comment and link is that we can add things when they are in a nice range to each other. This is very vague but if I had a more precise rule for when arithmetic would fail I would have posted it. I think that's what I'm trying to figure out right now. Now that I know it's not truly dynamic, I'm trying to understand with the requirements of the link you sent me, when arithmetic (adding, multiplying, etc) would fail and what range it will not. – Charlie Parker Sep 15 '20 at 17:21
  • @CharlieParker It will fail when the result cannot be represented with 23-bit mantisaa (23 binary digits) and 8-bit exponent (1e-38 ~ 1e38). We usually don't need to worry about the precision of intermediate values as they will be extended. – hkchengrex Sep 16 '20 at 02:01
  • what does "We usually don't need to worry about the precision of intermediate values as they will be extended." that mean? btw, thanks for the discussion! – Charlie Parker Sep 16 '20 at 18:47
  • @CharlieParker On second thought, I am not sure whether that statement is true. Forget about it. – hkchengrex Sep 16 '20 at 19:03
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/221593/discussion-between-charlie-parker-and-hkchengrex). – Charlie Parker Sep 16 '20 at 20:22