Don't checkpoint files have loss information of all epochs?
No they don't. Checkpoint files are designed to save and restore variables. They only contain the values of the specified (or all) variables at the time of saving, to be able to later restore that checkpoint, hence the name. Since the loss is usually not a variable but an intermediate tensor, the loss usually is not saved in checkpoint files at all.
But of course you can simply track and save the loss yourself, without using Tensorboard if you do not want to. I usually use pandas to do that.
Here is one way to achieve this:
import tensorflow as tf
import pandas as pd
# define a completely pointless model which just fits a single point just for
# demonstration
true = tf.placeholder(shape=(), dtype=tf.float32)
learned = tf.Variable(initial_value=0., dtype=tf.float32)
loss = tf.squared_difference(true, learned)
train = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
if __name__ == '__main__':
session = tf.Session()
session.run(tf.global_variables_initializer())
# create pandas data frame for logging
log = pd.DataFrame(columns=['loss'])
# train and append the loss to the data frame on every step
for step in range(0, 100):
log.loc[step] = session.run([train, loss], feed_dict={true: 100.})[1]
# save it
log.to_hdf('./log.h5', 'log')
Than later after the training is done you can load and plot the logged data in a different script like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# load the dataframe
log = pd.read_hdf('./log.h5', 'log')
# and this is how you could get your numpy array
print(np.squeeze(log.as_matrix(columns=['loss'])))
# usually this is sufficient though, since the index is the training step
# and matplotlib can directly plot that
print(log['loss'])
plt.plot(log['loss'])
plt.ylabel('Loss')
plt.xlabel('Step')
plt.show()
But like LI Xuhong suggests in the comments, there are many different ways to achieve something like this without reinventing the wheel. But since it is only a few lines of codes I usually prefer to do this myself like demonstrated above, especially when I need my own logging for the project anyways.