3

I trained a deep learning model in tensorflow.

I'd like to load loss values of each epoch into numpy.array from checkpoint file.

I mean the below.

np.array([3.45342, 3.23080, 2.98729, ...])

Don't checkpoint files have loss information of all epochs?

Does it need to save the all values during training?

How do I do that?

hrsma2i
  • 4,045
  • 6
  • 15
  • 24
  • Maybe [TensorBorad](https://github.com/tensorflow/tensorboard)? – LI Xuhong Apr 30 '18 at 12:49
  • I'd like to plot loss curve more simply instead of using tensorboard. – hrsma2i Apr 30 '18 at 12:58
  • 2
    It seems that you can [download raw data from TensorBoard](https://stackoverflow.com/questions/42355122/can-i-export-a-tensorflow-summary-to-csv). If you really don't want to use TensorBoard, you can store losses in a list or an array during training and write into a `.npy` file or even a `.txt` file. – LI Xuhong Apr 30 '18 at 13:09

1 Answers1

1

Don't checkpoint files have loss information of all epochs?

No they don't. Checkpoint files are designed to save and restore variables. They only contain the values of the specified (or all) variables at the time of saving, to be able to later restore that checkpoint, hence the name. Since the loss is usually not a variable but an intermediate tensor, the loss usually is not saved in checkpoint files at all.

But of course you can simply track and save the loss yourself, without using Tensorboard if you do not want to. I usually use pandas to do that. Here is one way to achieve this:

import tensorflow as tf
import pandas as pd

# define a completely pointless model which just fits a single point just for
# demonstration

true = tf.placeholder(shape=(), dtype=tf.float32)
learned = tf.Variable(initial_value=0., dtype=tf.float32)

loss = tf.squared_difference(true, learned)

train = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

if __name__ == '__main__':
    session = tf.Session()
    session.run(tf.global_variables_initializer())

    # create pandas data frame for logging
    log = pd.DataFrame(columns=['loss'])

    # train and append the loss to the data frame on every step
    for step in range(0, 100):
        log.loc[step] = session.run([train, loss], feed_dict={true: 100.})[1]

    # save it
    log.to_hdf('./log.h5', 'log')

Than later after the training is done you can load and plot the logged data in a different script like this:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# load the dataframe
log = pd.read_hdf('./log.h5', 'log')

# and this is how you could get your numpy array
print(np.squeeze(log.as_matrix(columns=['loss'])))

# usually this is sufficient though, since the index is the training step
# and matplotlib can directly plot that
print(log['loss'])
plt.plot(log['loss'])
plt.ylabel('Loss')
plt.xlabel('Step')
plt.show()

But like LI Xuhong suggests in the comments, there are many different ways to achieve something like this without reinventing the wheel. But since it is only a few lines of codes I usually prefer to do this myself like demonstrated above, especially when I need my own logging for the project anyways.