27

When saving a checkpoint, TensorFlow often saves a meta file: my_model.ckpt.meta. What is in that file, can we still restore a model even if we delete it and what kind of info did we lose if we restore a model without the meta file?

Clash
  • 4,896
  • 11
  • 47
  • 67

1 Answers1

39

This file contains a serialized MetaGraphDef protocol buffer. The MetaGraphDef is designed as a serialization format that includes all of the information required to restore a training or inference process (including the GraphDef that describes the dataflow, and additional annotations that describe the variables, input pipelines, and other relevant information). For example, the MetaGraphDef is used by TensorFlow Serving to start an inference service based on your trained model. We are investigating other tools that could use the MetaGraphDef for training.

Assuming that you still have the Python code for your model, you do not need the MetaGraphDef to restore the model, because you can reconstruct all of the information in the MetaGraphDef by re-executing the Python code that builds the model. To restore from a checkpoint, you only need the checkpoint files that contain the trained weights, which are written periodically to the same directory.

mrry
  • 125,488
  • 26
  • 399
  • 400
  • 4
    Is there a way to load these with the C++ api? I'm running into trouble executing graphs in C++ because the variables are not initialized with regular GraphDef protos. – Vlad Firoiu Apr 14 '16 at 18:43
  • Thanks for explaining and it helps a lot. – tobe Jun 28 '16 at 07:27
  • Hi @mrry, when I save multiple checkpoints with global steps, it results in multiple ```name-step.cpkt``` and ```name-step.cpkt.meta```, but there's only 1 ```checkpoint``` file. What is this ```checkpoint``` file and why aren't there as many of this file as ```name-step.cpkt``` please? – tnq177 Jul 30 '16 at 15:09
  • 3
    The `"checkpoint"` file contains a serialized `CheckpointState` protocol buffer, which contains pointers to the most recent checkpoints of the model. It is rewritten each time a checkpoint is saved, to give a single location where information about the up-to-date checkpoints is stored. – mrry Aug 01 '16 at 14:54
  • I do not see .ckpt file but I see .ckpt-n-meta and .ckpt-n-.data-0000 of 0001 and .ckpt-n-index files (where n is number of iteration). What could be the reason for this? – Chandra Dec 12 '16 at 21:22
  • 4
    @Chandra: The checkpoint format was changed in TensorFlow 0.12. Checkpoints now contain multiple files. If you want to use the old (less efficient) format, you can create your saver as `tf.train.Saver(..., write_veresion=tf.train.SaverDef.V1)`. – mrry Dec 12 '16 at 22:32
  • @mrry: How can we load the graph and the Variables from these 3 files in C++? The label_image example loads the model using a single .bp file. – mohaghighat Feb 03 '17 at 22:46