Questions tagged [sgd]
63 questions
14
votes
2 answers
SGD with momentum in TensorFlow
In Caffe, the SGD solver has a momentum parameter (link). In TensorFlow, I see that tf.train.GradientDescentOptimizer does not have an explicit momentum parameter. However, I can see that there is tf.train.MomentumOptimizer optimizer. Is it the…

A Das
- 817
- 2
- 10
- 21
4
votes
2 answers
How to calculate maximum gradient for each layer given a mini-batch
I try to implement a fully-connected model for classification using the MNIST dataset. A part of the code is the following:
n = 5
act_func = 'relu'
classifier = tf.keras.models.Sequential()
classifier.add(layers.Flatten(input_shape = (28, 28,…

anastasia
- 63
- 4
4
votes
2 answers
Why the SGDRegressor function in sklearn can't converge to the correct optima?
I was practicing using SGDRegressor in sklearn but I meet some problems, and I have simplified it as the following code.
import numpy as np
from sklearn.linear_model import SGDRegressor
X = np.array([0,0.5,1]).reshape((3,1))
y =…

BlackieMia
- 67
- 5
3
votes
0 answers
ValueError: No gradients provided for any variable: ['x_hat:0']
I am working on transforming images in order to make adversarial attacks in computer vision systems that are robust to rotation. I would like to find an x_hat image that could optimize a mean loss function after several random rotations. Here is how…

mad
- 2,677
- 8
- 35
- 78
3
votes
1 answer
Using SGD without using sklearn (LogLoss increasing with every epoch)
def train(X_train,y_train,X_test,y_test,epochs,alpha,eta0):
w,b = initialize_weights(X_train[0])
loss_test=[]
N=len(X_train)
for i in range(0,epochs):
print(i)
for j in range(N-1):
…

raju
- 119
- 9
2
votes
1 answer
Anaconda: ValueError: Could not interpret optimizer identifier
I try to run this code:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD
and I get this error:
ImportError: cannot import name 'SGD' from 'keras.optimizers'…

FanChen
- 23
- 1
- 5
2
votes
2 answers
SGD algorithm from scratch to predict movie rating
Based on this equation I have to compute derivative w.r.t b which I did below
optimization equation
def derivative_db(user_id,item_id,rating,U,V,mu,alpha):
'''In this function, we will compute dL/db_i'''
return…

bruce
- 23
- 4
2
votes
1 answer
BayesSearchCV is not working during SGDClassifier parameter tuning
I am trying to use BayesSearchCV for the parameter tuning of the SGDClassifier. Below is my code which I tried.
import seaborn
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from skopt import…

BC Smith
- 727
- 1
- 7
- 19
2
votes
1 answer
CNN training loss regular spikes at the end of the epoch
I am training a CNN in PyTorch with Adam and the initial learning rate is 1e-5. I have 5039 samples in my epoch and the batch size is 1. I have observed that I have a regular spike pattern of training loss at the end of an epoch. Here is a plot of…

tivan
- 21
- 3
2
votes
1 answer
Model learns with SGD but not Adam
I was going through a basic PyTorch MNIST example here and noticed that when I changed the optimizer from SGD to Adam the model did not converge. Specifically, I changed line 106 from
optimizer = optim.SGD(model.parameters(), lr=args.lr,…

thefxperson
- 21
- 2
1
vote
1 answer
How loss_fn connected to model and optimizer in pytorch
The following code is just a template, you see the following pattern a lot in AI codes.
I have a specific question about loss.backward(). in the following code we have a model, as we pass model.parameters() to optimizer so optimizer and model are…

WebMaster
- 3,050
- 4
- 25
- 77
1
vote
1 answer
Is there a way to print the calculated max gradient of each layer for a given mini-batch?
I am implementing a fully-connected model for classification using the MNIST dataset. A part of the code is the following:
model=tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(28, 28,…

Marios Gab
- 13
- 3
1
vote
1 answer
Stochastic Gradient Decent vs. Gradient Decent for x**2 function
I would like to understand the difference between SGD and GD on the easiest example of function: y=x**2
The function of GD is here:
def gradient_descent(
gradient, start, learn_rate, n_iter=50, tolerance=1e-06
):
vector = start
for _ in…

nuclear_engineer
- 27
- 5
1
vote
2 answers
Linear autoencoder using Pytorch
How do we build a simple linear autoencoder and train it using torch.optim optimisers?
How do I do it using autograd (.backward()) and optimising the MSE loss, and then learn the values of the weights and biases in the encoder, and the decoder (ie.…

Punpun
- 21
- 4
1
vote
0 answers
How to learn a Normalizing Flow with Stochastic Gradient Descent
I'm recently working on implementing the Annealed Flow Transport Method as described in https://arxiv.org/abs/2102.07501. At one point the task is to minimize a given loss function by learning a Normalizing Flow using SGD. I studied many papers on…

Christian Lohmann
- 21
- 2