Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.

Tag usage:

Questions on gradient-descent should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

Read more:

1428 questions

331

votes

7 answers

Why do we need to call zero_grad() in PyTorch?

Why does zero_grad() need to be called during training? | zero_grad(self) | Sets gradients of all model parameters to zero.

python neural-network deep-learning pytorch gradient-descent

asked Dec 28 '17 at 04:31

user1424739

11,937
17
63
152

166

votes

6 answers

pytorch - connection between loss.backward() and optimizer.step()

Where is an explicit connection between the optimizer and the loss? How does the optimizer know where to get the gradients of the loss without a call liks this optimizer.step(loss)? -More context- When I minimize the loss, I didn't have to pass the…

machine-learning neural-network pytorch gradient-descent

asked Dec 30 '18 at 06:30

aerin

20,607
28
102
140

139

votes

4 answers

Pytorch, what are the gradient arguments

I am reading through the documentation of PyTorch and found an example where they write gradients = torch.FloatTensor([0.1, 1.0, 0.0001]) y.backward(gradients) print(x.grad) where x was an initial variable, from which y was constructed (a…

neural-network gradient pytorch torch gradient-descent

asked Apr 17 '17 at 12:04

Qubix

4,161
7
36
73

125

votes

8 answers

Why should weights of Neural Networks be initialized to random numbers?

I am trying to build a neural network from scratch. Across all AI literature there is a consensus that weights should be initialized to random numbers in order for the network to converge faster. But why are neural networks initial weights…

machine-learning neural-network artificial-intelligence mathematical-optimization gradient-descent

asked Nov 17 '13 at 05:34

Shayan RC

3,152
5
33
40

123

votes

6 answers

Common causes of nans during training of neural networks

I've noticed that a frequent occurrence during training is NANs being introduced. Often times it seems to be introduced by weights in inner-product/fully-connected or convolution layers blowing up. Is this occurring because the gradient computation…

machine-learning neural-network deep-learning caffe gradient-descent

asked Nov 27 '15 at 17:23

Aidan Gomez

8,167
5
28
51

103

votes

4 answers

How to do gradient clipping in pytorch?

What is the correct way to perform gradient clipping in pytorch? I have an exploding gradients problem.

python machine-learning deep-learning pytorch gradient-descent

asked Feb 15 '19 at 20:09

Gulzar

23,452
27
113
201

votes

10 answers

Neural network always predicts the same class

I'm trying to implement a neural network that classifies images into one of the two discrete categories. The problem is, however, that it currently always predicts 0 for any input and I'm not really sure why. Here's my feature extraction method: def…

python-3.x numpy neural-network deep-learning gradient-descent

asked Jan 05 '17 at 15:06

Yurii Dolhikh

1,361
1
11
13

votes

5 answers

What is the difference between Gradient Descent and Newton's Gradient Descent?

I understand what Gradient Descent does. Basically it tries to move towards the local optimal solution by slowly moving down the curve. I am trying to understand what is the actual difference between the plain gradient descent and the Newton's…

machine-learning data-mining mathematical-optimization gradient-descent newtons-method

asked Aug 22 '12 at 05:27

London guy

27,522
44
121
179

votes

4 answers

why gradient descent when we can solve linear regression analytically

what is the benefit of using Gradient Descent in the linear regression space? looks like the we can solve the problem (finding theta0-n that minimum the cost func) with analytical method so why we still want to use gradient descent to do the same…

machine-learning linear-regression gradient-descent

asked Aug 12 '13 at 16:18

John

2,107
3
22
39

votes

5 answers

gradient descent using python and numpy

def gradient(X_norm,y,theta,alpha,m,n,num_it): temp=np.array(np.zeros_like(theta,float)) for i in range(0,num_it): h=np.dot(X_norm,theta) #temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*X_norm[:,j][np.newaxis,:] ) ) …

python numpy machine-learning linear-regression gradient-descent

asked Jul 22 '13 at 09:55

Madan Ram

votes

4 answers

Why do we need to explicitly call zero_grad()?

Why do we need to explicitly zero the gradients in PyTorch? Why can't gradients be zeroed when loss.backward() is called? What scenario is served by keeping the gradients on the graph and asking the user to explicitly zero the gradients?

neural-network deep-learning pytorch gradient-descent

asked Jun 24 '17 at 02:39

Wasi Ahmad

35,739
32
114
161

votes

5 answers

pytorch how to set .requires_grad False

I want to set some of my model frozen. Following the official docs: with torch.no_grad(): linear = nn.Linear(1, 1) linear.eval() print(linear.weight.requires_grad) But it prints True instead of False. If I want to set the model in eval…

python pytorch gradient-descent

asked Aug 08 '18 at 13:36

Qian Wang

votes

4 answers

What is the difference between SGD and back-propagation?

Can you please tell me the difference between Stochastic Gradient Descent (SGD) and back-propagation?

machine-learning artificial-intelligence difference backpropagation gradient-descent

asked Jun 21 '16 at 20:02

Влад Концевич

votes

1 answer

Sklearn SGDClassifier partial fit

I'm trying to use SGD to classify a large dataset. As the data is too large to fit into memory, I'd like to use the partial_fit method to train the classifier. I have selected a sample of the dataset (100,000 rows) that fits into memory to test fit…

python machine-learning scikit-learn gradient-descent

asked Jul 07 '14 at 18:31

David M.

4,518
2
20
25

votes

5 answers

How to calculate optimal batch size?

Sometimes I run into a problem: OOM when allocating tensor with shape e.g. OOM when allocating tensor with shape (1024, 100, 160) Where 1024 is my batch size and I don't know what's the rest. If I reduce the batch size or the number of neurons in…

machine-learning neural-network deep-learning keras gradient-descent

asked Oct 09 '17 at 20:25

Andrzej Gis

13,706
14
86
130

2 3

…

95 96 Next