Training on GPU produces slightly different results then when trained on CPU

Question

I just tried my new Script to train a model on the GPU rather than on the CPU. And the training values (loss, metrics) differ to when trained on the CPU.

I was under the impression that running on cuda vs on cpu should not make a difference. Was I wrong or has it something to do with my code?

Using pytorch=10.1.2 and cudatoolkit=10.1

Dunes · Answer 1 · 2022-06-02T18:14:38.970

How big a difference? Tiny differences are to be expected. Order of commutative operations matters for floating point computations. That is:

serialised_total = 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
parallelised_total = (0.1 + 0.1 + 0.1) + (0.1 + 0.1 + 0.1)
# No actual parallelisation is performed. The above is just example of how
# the serialised summation could be broken up into two separate summations.
assert serialised_total != parallelised_total
#                   0.6 != 0.6000000000000001

The results of each side of the equation are still very very close, they're just not exactly the same. See this answer for why.

If you are using the GPU then it will be making use of parallelisation, and so the order of operations will not be the same. For instance, if you sum a series of floating point values then you can speed things up by breaking the list up into chunks and sending each chunk to a different core to be summed. You can then sum the results of each chunk. This will be much quicker, but the order of operations will be different than if you summed the values serially.

In the above example, it is the "parallelised" total that is less accurate than the "serialised" total. This is not a rule, and sometimes it will be the "parallelised" total that is more accurate. For example:

# n = 8
serialised_total = 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
parallelised_total = (0.1 + 0.1 + 0.1 + 0.1) + (0.1 + 0.1 + 0.1 + 0.1)
assert serialised_total != parallelised_total
#    0.7999999999999999 != 0.8

Without knowing more about your problem, any answers are just speculation about the issue. Including this one.

Training on GPU produces slightly different results then when trained on CPU

1 Answers1