7
import tensorflow as tf

# construct graph
v1 = tf.Variable([0], name='v1')
v2 = tf.Variable([0], name='v2')

# run graph
with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  saver = tf.train.Saver()
  saver.save(sess, 'ckp')

What is the relationship between the index file and the data file?

.
├── checkpoint
├── ckp.data-00000-of-00001
├── ckp.index
├── ckp.meta
jkschin
  • 5,776
  • 6
  • 35
  • 62
Guangcong Liu
  • 805
  • 1
  • 8
  • 6

2 Answers2

6

This question is similar to TensorFlow, why there are 3 files after saving the model?, but that does not discuss in detail what are .index and .data-00000-of-00001 files.

Similarly, What is the TensorFlow checkpoint meta file? answers what does .meta file do.

.index stores the list of variable names and shapes saved. Hence, it is usually much smaller in size. .data-00000-of-00001 stores the actual values of all the variables saved. Hence, it is usually much larger in size.

We can test it out with the code below. But before this, run the MNIST example in tensorflow/tensorflow/examples/tutorials/mnist/fully_connected_feed.py to generate the log files.

import tensorflow as tf
from tensorflow.python.training import checkpoint_utils as cp

print cp.list_variables('/tmp/tensorflow/mnist/logs/fully_connected_feed/model.ckpt-1999')
print cp.load_variable('/tmp/tensorflow/mnist/logs/fully_connected_feed/model.ckpt-1999', 'hidden1/biases')

cp.list_variables then prints out the following:

[('global_step', []), ('hidden1/biases', [128]), ('hidden1/weights', [784, 128]), ('hidden2/biases', [32]), ('hidden2/weights', [128, 32]), ('softmax_linear/biases', [10]), ('softmax_linear/weights', [32, 10])]

cp.load_variable then prints out the entire vector of floating point values:

[  1.49112539e-02   2.43028291e-02   1.82662811e-02   2.32475083e-02
  -7.84891471e-03   1.87947564e-02  -6.21244172e-03   9.12105478e-03
  -1.70869497e-03   2.94519793e-02   6.23045377e-02   1.99174266e-02
   ...
   1.13238255e-02  -1.11185517e-02   2.25203596e-02  -4.95722517e-04
   1.22644939e-02   9.39049758e-03   3.05090044e-02   1.62753556e-02
   2.32785419e-02   3.78636681e-02   2.61069946e-02   2.02859659e-02]

cp.list_variables can run with only .index present, but cp.load_variable requires both .index and .data-00000-of-00001 to run.

jkschin
  • 5,776
  • 6
  • 35
  • 62
1

This can be found from the comments in TF source code.

The ".index" file is a string-string immutable table (tensorflow::table::Table). Each key is a name of a tensor and its value is a serialized BundleEntryProto. Each BundleEntryProto describes the metadata of a tensor: which of the "data" files contains the content of a tensor, the offset into that file, checksum, some auxiliary data, etc.

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753