Not being able to reproduce the Same results over multiple runs for an LSTM model in tensorflow

Question

I was training an LSTM network in tensorflow. My model has the following configuration:

time_steps = 1700
Cell size: 120
Number of input features x = 512.
Batch size: 34
Optimizer: AdamOptimizer with learning rate = 0.01
Number of epochs = 20

I have GTX 1080 Ti. And my tensorflow version is 1.8.

Additionally, I have set the random seed through tf.set_random_seed(mseed), and I have set the random seed for every trainable variable's initializer so that I can reproduce the same results after multiple runs.

After training the model multiple times, every time for 20 epochs, I found that I was achieving the same exact loss for the first several epochs (7, 8 or 9) "during each run", and then the loss start to differ. I was wondering why this is occuring; and if possible how can someone totally reproduce the results of any model.

Additionally, In my case I feeding the whole data during every iteration. That is, I have doing back propagation through time (BPTT) and not truncated BPTT. In other words, I have 2 iterations in total which is equal to the number of epochs as well.

The following figure demonstrate my problem. Please note that every row correspond to one epoch.

Please note that each column correspond to a different run. (I only included 2 columns/runs) to demonstrate my point.

Finally, replacing the input features with a new features of dimensions 100, I get better results as shown in the following image:

Therefore, I am not sure if this is a hardware issue or not?

Any help is much appreciated!!

score 0 · Answer 1 · answered Jan 22 '19 at 02:36

The likely issue, assuming everything you've done is correct, is Adam is not reproducible so that might be an issue.

But there are other potential sources of errors: finalizing your graph and setting its seed, and operation level seeds

Hope this helps! It's hard to be sure everything you've done is correct without code but who knows how long your code might be

score 0 · Answer 2 · answered Jan 22 '19 at 02:50

0

To the best of my knowledge, as you might have tried, tf.set_random_seed(seed=1) or seed equals any other integer number could be a possible solution.

answered Jan 22 '19 at 02:50

guorui

871
2
9
21

Not being able to reproduce the Same results over multiple runs for an LSTM model in tensorflow

2 Answers2