What is the default batch size of pytorch SGD?

Question

What does pytorch SGD do if I feed the whole data and do not specify the batch size? I don't see any "stochastic" or "randomness" in the case. For example, in the following simple code, I feed the whole data (x,y) into a model.

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)  
for epoch in range(5):
    y_pred = model(x_data)
    loss = criterion(y_pred, y_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Suppose there are 100 data pairs (x,y), i.e. x_data and y_data each has 100 elements.

Question: It seems to me that all the 100 gradients are calculated before one update of parameters. Size of a "mini_batch" is 100, not 1. So there is no randomness, am I right? At first, I think SGD means randomly choose 1 data point and calculate its gradient, which will be used as an approximation of the true gradient from all data.

score 12 · Answer 1 · answered Feb 05 '20 at 04:49

12

The SGD optimizer in PyTorch is just gradient descent. The stocastic part comes from how you usually pass a random subset of your data through the network at a time (i.e. a mini-batch or batch). The code you posted passes the entire dataset through on each epoch before doing backprop and stepping the optimizer so you're really just doing regular gradient descent.

answered Feb 05 '20 at 04:49

jodag

19,885
5
47
66

1

Thanks! It's super confusing that they named it *SGD*, as the name implies that it would implement [Stochastic Gradient Descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Iterative_method) under the hood. – Ben Mar 11 '22 at 02:01

What is the default batch size of pytorch SGD?

1 Answers1