Questions tagged [sgd]

63 questions
14
votes
2 answers

SGD with momentum in TensorFlow

In Caffe, the SGD solver has a momentum parameter (link). In TensorFlow, I see that tf.train.GradientDescentOptimizer does not have an explicit momentum parameter. However, I can see that there is tf.train.MomentumOptimizer optimizer. Is it the…
A Das
  • 817
  • 2
  • 10
  • 21
4
votes
2 answers

How to calculate maximum gradient for each layer given a mini-batch

I try to implement a fully-connected model for classification using the MNIST dataset. A part of the code is the following: n = 5 act_func = 'relu' classifier = tf.keras.models.Sequential() classifier.add(layers.Flatten(input_shape = (28, 28,…
anastasia
  • 63
  • 4
4
votes
2 answers

Why the SGDRegressor function in sklearn can't converge to the correct optima?

I was practicing using SGDRegressor in sklearn but I meet some problems, and I have simplified it as the following code. import numpy as np from sklearn.linear_model import SGDRegressor X = np.array([0,0.5,1]).reshape((3,1)) y =…
BlackieMia
  • 67
  • 5
3
votes
0 answers

ValueError: No gradients provided for any variable: ['x_hat:0']

I am working on transforming images in order to make adversarial attacks in computer vision systems that are robust to rotation. I would like to find an x_hat image that could optimize a mean loss function after several random rotations. Here is how…
mad
  • 2,677
  • 8
  • 35
  • 78
3
votes
1 answer

Using SGD without using sklearn (LogLoss increasing with every epoch)

def train(X_train,y_train,X_test,y_test,epochs,alpha,eta0): w,b = initialize_weights(X_train[0]) loss_test=[] N=len(X_train) for i in range(0,epochs): print(i) for j in range(N-1): …
2
votes
1 answer

Anaconda: ValueError: Could not interpret optimizer identifier

I try to run this code: from keras.models import Sequential from keras.layers import Dense, Activation from keras.optimizers import SGD and I get this error: ImportError: cannot import name 'SGD' from 'keras.optimizers'…
FanChen
  • 23
  • 1
  • 5
2
votes
2 answers

SGD algorithm from scratch to predict movie rating

Based on this equation I have to compute derivative w.r.t b which I did below optimization equation def derivative_db(user_id,item_id,rating,U,V,mu,alpha): '''In this function, we will compute dL/db_i''' return…
bruce
  • 23
  • 4
2
votes
1 answer

BayesSearchCV is not working during SGDClassifier parameter tuning

I am trying to use BayesSearchCV for the parameter tuning of the SGDClassifier. Below is my code which I tried. import seaborn from sklearn.preprocessing import LabelEncoder from sklearn.model_selection import train_test_split from skopt import…
BC Smith
  • 727
  • 1
  • 7
  • 19
2
votes
1 answer

CNN training loss regular spikes at the end of the epoch

I am training a CNN in PyTorch with Adam and the initial learning rate is 1e-5. I have 5039 samples in my epoch and the batch size is 1. I have observed that I have a regular spike pattern of training loss at the end of an epoch. Here is a plot of…
2
votes
1 answer

Model learns with SGD but not Adam

I was going through a basic PyTorch MNIST example here and noticed that when I changed the optimizer from SGD to Adam the model did not converge. Specifically, I changed line 106 from optimizer = optim.SGD(model.parameters(), lr=args.lr,…
1
vote
1 answer

How loss_fn connected to model and optimizer in pytorch

The following code is just a template, you see the following pattern a lot in AI codes. I have a specific question about loss.backward(). in the following code we have a model, as we pass model.parameters() to optimizer so optimizer and model are…
WebMaster
  • 3,050
  • 4
  • 25
  • 77
1
vote
1 answer

Is there a way to print the calculated max gradient of each layer for a given mini-batch?

I am implementing a fully-connected model for classification using the MNIST dataset. A part of the code is the following: model=tf.keras.models.Sequential([ tf.keras.layers.Input(shape=(28, 28,…
Marios Gab
  • 13
  • 3
1
vote
1 answer

Stochastic Gradient Decent vs. Gradient Decent for x**2 function

I would like to understand the difference between SGD and GD on the easiest example of function: y=x**2 The function of GD is here: def gradient_descent( gradient, start, learn_rate, n_iter=50, tolerance=1e-06 ): vector = start for _ in…
1
vote
2 answers

Linear autoencoder using Pytorch

How do we build a simple linear autoencoder and train it using torch.optim optimisers? How do I do it using autograd (.backward()) and optimising the MSE loss, and then learn the values of the weights and biases in the encoder, and the decoder (ie.…
Punpun
  • 21
  • 4
1
vote
0 answers

How to learn a Normalizing Flow with Stochastic Gradient Descent

I'm recently working on implementing the Annealed Flow Transport Method as described in https://arxiv.org/abs/2102.07501. At one point the task is to minimize a given loss function by learning a Normalizing Flow using SGD. I studied many papers on…
1
2 3 4 5