1

I know that stochastic gradient descent always gives different results. What are the best practices to reduce this variance today? I tried to predict a simple function with two different approaches and every time I train them I see very different results.

Input data:

def plot(model_out):
  fig, ax = plt.subplots()
  ax.grid(True, which='both')
  ax.axhline(y=0, color='k', linewidth=1)
  ax.axvline(x=0, color='k', linewidth=1)

  ax.plot(x_line, y_line, c='g', linewidth=1)
  ax.scatter(inputs, targets, c='b', s=8)
  ax.scatter(inputs, model_out, c='r', s=8)

a = 5.0; b = 3.0; x_left, x_right = -16., 16.
NUM_EXAMPLES = 200
noise   = tf.random.normal((NUM_EXAMPLES,1))

inputs  = tf.random.uniform((NUM_EXAMPLES,1), x_left, x_right)
targets = a * tf.sin(inputs) + b + noise
x_line  = tf.linspace(x_left, x_right, 500)
y_line  = a * tf.sin(x_line) + b

Keras training:

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(50, activation='relu', input_shape=(1,)))
model.add(tf.keras.layers.Dense(50, activation='relu'))
model.add(tf.keras.layers.Dense(1))

model.compile(loss='mse', optimizer=tf.keras.optimizers.Adam(0.01))
model.fit(inputs, targets, batch_size=200, epochs=2000, verbose=0)

print(model.evaluate(inputs, targets, verbose=0))
plot(model.predict(inputs))

enter image description here

Manual training:

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(50, activation='relu', input_shape=(1,)))
model.add(tf.keras.layers.Dense(50, activation='relu'))
model.add(tf.keras.layers.Dense(1))

optimizer = tf.keras.optimizers.Adam(0.01)

@tf.function
def train_step(inpt, targ):
  with tf.GradientTape() as g:
    model_out = model(inpt)
    model_loss = tf.reduce_mean(tf.square(tf.math.subtract(targ, model_out)))

  gradients = g.gradient(model_loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))
  return model_loss

train_ds = tf.data.Dataset.from_tensor_slices((inputs, targets))
train_ds = train_ds.repeat(2000).batch(200)

def train(train_ds):
  for inpt, targ in train_ds:
    model_loss = train_step(inpt, targ)
  tf.print(model_loss)

train(train_ds)
plot(tf.squeeze(model(inputs)))

enter image description here

dereks
  • 544
  • 1
  • 8
  • 25
  • Try posting your question in **AI Stack Exchange**: https://ai.stackexchange.com/ – Ahmad Dec 01 '19 at 11:44
  • Aragon S, ok, thanks. – dereks Dec 01 '19 at 11:46
  • increasing batch size directly reduces vafaince – jeremy_rutman Dec 01 '19 at 11:57
  • The goal is to use small dataset. I have `200` examples in dataset and `batch_size=200`. – dereks Dec 01 '19 at 12:00
  • Weights are randomly chosen, so obviously each time you'll get different results, set the random seeds (numpy's, tf's and python's) to predefined values and you'll be able to reproduce – bluesummers Dec 01 '19 at 12:47
  • The question is not about how to reproduce. It's about how to converge with low variance. – dereks Dec 01 '19 at 13:16
  • 1
    See [here](https://stackoverflow.com/questions/59075244/if-keras-results-are-not-reproducible-whats-the-best-practice-for-comparing-mo/59075958#59075958); without specifying further, loss variance _is_ related to reproducibility - else you need to define with respect to _what_ is loss fluctuating (hyperparameters, repeated runs, etc). Also show some loss plots – OverLordGoldDragon Dec 01 '19 at 14:08
  • @OverLordGoldDragon Thank you, it was helpful. But I still don't understand what should I do. Do you know where to find proper example of model comparison in TF2? – dereks Dec 01 '19 at 16:12
  • I don't follow the question; what are you asking, exactly? Yes, loss will _vary_ with _varying_ hyperparameters, as it should - but if it varies for same hyperparameters, you have a reproducibility problem. If you are asking on how to do hyperparameter selection, that's an entirely different question, but see [here](https://stackoverflow.com/questions/58103035/keras-early-stopping-with-train-on-batch/58103272#58103272) for an idea – OverLordGoldDragon Dec 01 '19 at 16:26
  • I'm looking for an example of comparing models with `the same hyperparameters` and `different random seeds` in `TF2`. I mean how to reduce the variance which was caused by random weights initialization. – dereks Dec 01 '19 at 16:39

0 Answers0