What’s the difference between the two operations in back-propagation with pytorch?

Question

I have a tensor x and x.shape=(batch_size,10)

I want to add one to all of the elements, and take two different operations

x=x+1
for i in range(0,batch_size): x[i]=x[i]+1

I got the same tensors with the two operations,but when I call loss.backward(), (2) takes much more time than (1) in back propagation.

What’s the difference between them???

score 0 · Answer 1 · answered Dec 09 '18 at 13:20

0

This is to be expected. Firstly, the forward is also a lot slower: with your for loop, Python dispatches batch_size times the following requests to PyTorch:

fetch ith element of x
add 1
update ith element of x with the incremented value

Python is slow. In version two, Python dispatches a single message "add 1 everywhere" to PyTorch. PyTorch is much faster than Python (let alone GPU acceleration it's capable of). This is thanks to the technique called vectorization and is not specific to PyTorch, but essentially all Python (and many other) math packages.

Secondly, for your backward, PyTorch needs to keep track of all operations which happened to x and backpropagate through them. In the first case, there's batch_size of them, in the second, just one. Again, vectorization wins.

answered Dec 09 '18 at 13:20

Jatentaki

11,804
4
41
37

Thanks for your answer，and I try to take (3). for i in range(0,batch_size):x=x+1 (4). for i in range(0,batch_size):x+=1. I found (2) still took much more time than (3).and (4) is nearly as slow as (2) in back-propagation,The time they spend is:(4)>=(2)>>(3)>=(1). Is there any difference between the computation graph they create? It seems that (2) and (4) is an in-place operation, does the result is related to this? – yijia Tan Dec 09 '18 at 15:19
I am honestly lost in your labeling of operations. What is (2)? Computation graphs record all pytorch operations on tensors with `requires_grad=True` (or at least you can assume they do, they may optimize things away but treat that as an implementation detail). Yes, there is difference between `x = x + 1` and `x += 1`. The latter is in-place, the former is not (it looks as if it were, but it's redefining `x` rather than mutating it). – Jatentaki Dec 09 '18 at 15:41
I mean if i replace x[i]=x[i]+1 in(2) with x+=1, the speed of back-propagation is nearly as slow as x[i]=x[i]+1,but if I replace it with x=x+1, it's amost as quick as (1) in back-propagation. So I was thinking the speed of (2) may be related to the in-place operation,as x[i]=x[i]+1 and x+=1 are both in-place operation. – yijia Tan Dec 09 '18 at 15:53
I'm lost what your goal is. Executing `x[i] = x[i] + 1` in a loop is not equivalent mathematically to executing `x = x + 1` in a loop. In the first, you add 1 to each entry of `x`, in the second you add 1 to the _whole_ `x` `batch_size` times, so it's equivalent to executing `x[i] = x[i] + batch_size` in a loop. In terms of computation graphs those are all different. If you are looking to optimize some specific operation, consider asking a new question. If you're just looking to find the difference between the back-prop in this case, I think I've explained all there is to it. – Jatentaki Dec 09 '18 at 16:07

What’s the difference between the two operations in back-propagation with pytorch?

1 Answers1