6

A follow up to this question:

How to save a Tensorflow Checkpoint file from Google Colaboratory in when using TPU mode?

Where the official way of saving a checkpoint when using a Tensorflow TPU is to use the Google Cloud Service.

I am working if there is a workaround to this for those who do not wish to use GCS. Perhaps for each variable, do a .eval(), save the variable. And then set the save variable to the 'init' value for each variable.

A major issue I foresee though is saving and loading the parameters for the optimizers.

For Keras, the weights do seem to be saved from TPU to local

https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/shakespeare_with_tpu_and_keras.ipynb

INFO:tensorflow:Copying TPU weights to the CPU

So I imagine that there's a general workaround too, without using keras.

Bob Smith
  • 36,107
  • 11
  • 98
  • 91
SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116

2 Answers2

2

Take a look at THIS CODE from Keras

If I understood correctly weights are not saved drectly from TPU, instead weights are synced to CPU and the saved to colab storage.

EDIT

Also see: this answer.

alex
  • 425
  • 4
  • 21
-1

I just found the below solution after seeing this thread, so I wanted to add this option in. From the tensorflow documentation, there is an option field that you can use in save/load/restore functions in keras as well as tf.train.Checkpoint and the save method of tf.train.CheckpointManager which allow you to pass in an experimental localhost syncing strategy.

Copying their code example:

model = get_model()

# Saving the model to a path on localhost.
saved_model_path = '/tmp/tf_save'
save_options = tf.saved_model.SaveOptions(experimental_io_device='/job:localhost')
model.save(saved_model_path, options=save_options)

# Loading the model from a path on localhost.
another_strategy = tf.distribute.MirroredStrategy()
with another_strategy.scope():
  load_options = tf.saved_model.LoadOptions(experimental_io_device='/job:localhost')
  loaded = tf.keras.models.load_model(saved_model_path, options=load_options)

Documentation Sources:

John St. John
  • 1,542
  • 1
  • 13
  • 22