6

I am working with a pretrained keras model and I want to run it on TPU by Google Colaboratory, but I get the following error:

ValueError: Layer has a variable shape in a non-batch dimension. TPU models must have constant shapes for all operations.

You may have to specify 'input_length' for RNN/TimeDistributed layers.

Layer: Input shape: [(None, 128, 768), (None, 1)] Output shape: (None, None, 768)

I'm working with keras-xlnet. As I understand it, TPU needs to have fixed batch size when the model is compiled as explained here and here.

The model is loaded from checkpoint:

from keras_xlnet import Tokenizer, load_trained_model_from_checkpoint, 
      ATTENTION_TYPE_BI

checkpoint_path = 'xlnet_cased_L-12_H-768_A-12'

tokenizer = Tokenizer(os.path.join(checkpoint_path, 'spiece.model'))
model = load_trained_model_from_checkpoint(
    config_path=os.path.join(checkpoint_path, 'xlnet_config.json'),
    checkpoint_path=os.path.join(checkpoint_path, 'xlnet_model.ckpt'),
    batch_size=BATCH_SIZE,
    memory_len=512,
    target_len=SEQ_LEN,
    in_train_phase=False,
    attention_type=ATTENTION_TYPE_BI,
    )
 model.summary()

model is then compiled (after a few changes):

from keras_bert import AdamWarmup, calc_train_steps

decay_steps, warmup_steps = calc_train_steps(
    y_train.shape[0],
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    )


model.compile(
    AdamWarmup(decay_steps=decay_steps, warmup_steps=warmup_steps, lr=LR),
    loss='binary_crossentropy',
    )

Then, model is loaded to TPU, where the error occures:

tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
    strategy = tf.contrib.tpu.TPUDistributionStrategy(
    tf.contrib.cluster_resolver.TPUClusterResolver(tpu=tpu_address)
    )

with tf.keras.utils.custom_object_scope(get_custom_objects()):
    tpu_model = tf.contrib.tpu.keras_to_tpu_model(model, strategy=strategy)

Is there a way I can fix my batch size on compile time to get rid of the error above? Or is the problem something entirely different?

Bob Smith
  • 36,107
  • 11
  • 98
  • 91
chefhose
  • 2,399
  • 1
  • 21
  • 32
  • 1
    As a heads up, running on a TPU can be a big pain to resolve all the various bugs, mainly per TensorFlow's own outdated modules; even if you solve this one, it's quite likely there will be another one. Further, functionality is mainly limited to core TF modules (rather than e.g. custom optimizers). Lastly, a TPU is _very_ fast, but you won't see much of a gain unless your [input data pipeline](https://www.tensorflow.org/guide/data_performance) is highly efficient; data load time can easily become the bottleneck. – OverLordGoldDragon Nov 03 '19 at 14:38
  • Thanks for your comment. I think that my input pipeline etc. works good because it is mainly the same setup which I used with [keras-bert](https://github.com/CyberZHG/keras-bert). GPU is a lot slower with my training set (with BERT), so slow it does not make sense to run it on the whole dataset. – chefhose Nov 03 '19 at 19:15
  • Fair, then it should be worth debugging the TPU – OverLordGoldDragon Nov 03 '19 at 20:00
  • I'm afraid you need to rewrite the model yourself, with fixed shapes and copy the weights. But things like adding information of the `model.summary()` to this question (including submodels, if any) will certainly help to detect something more evident. – Daniel Möller Nov 04 '19 at 14:22
  • model summary is found [here](https://gist.github.com/KimBue/e6510be0b7b51084bfbb0ad0b486eb4e) as it is too big to upload here usefully. – chefhose Nov 04 '19 at 15:20

1 Answers1

2

I agree with the comments - to get it to work you would need to adjust the various variable output shapes (e.g. None, None, 768) to fixed sizes (other than the first batch dimension). Maybe you could do this with simple padding. If you can loop through the saved model layers and load the weights to a new model that you write with padded dimensions, it may even work. I would say that's more trouble than it's worth considering TPU ready versions are available already.

I suggest moving away from Keras for this model. The official TensorFlow XLNet implementation should work with TPUs without modification. It also comes with pre-trained checkpoints. https://github.com/zihangdai/xlnet

It uses the standard TPUEstimator class to send a the model function to the TPU worker so you won't need to mess around with tf.contrib.tpu.keras_to_tpu_model.

The example given in the repository can be run in colab where $TPU_NAME is $COLAB_TPU_ADDR and you upload the pretrained checkpoints and the imdb data to a bucket colab can access.

python run_classifier.py \
  --use_tpu=True \
  --tpu=${TPU_NAME} \
  --do_train=True \
  --do_eval=True \
  --eval_all_ckpt=True \
  --task_name=imdb \
  --data_dir=${IMDB_DIR} \
  --output_dir=${GS_ROOT}/proc_data/imdb \
  --model_dir=${GS_ROOT}/exp/imdb \
  --uncased=False \
  --spiece_model_file=${LARGE_DIR}/spiece.model \
  --model_config_path=${GS_ROOT}/${LARGE_DIR}/model_config.json \
  --init_checkpoint=${GS_ROOT}/${LARGE_DIR}/xlnet_model.ckpt \
  --max_seq_length=512 \
  --train_batch_size=32 \
  --eval_batch_size=8 \
  --num_hosts=1 \
  --num_core_per_host=8 \
  --learning_rate=2e-5 \
  --train_steps=4000 \
  --warmup_steps=500 \
  --save_steps=500 \
  --iterations=500
Tyler
  • 1,313
  • 12
  • 23