I am working with a pretrained keras model and I want to run it on TPU by Google Colaboratory, but I get the following error:
ValueError: Layer has a variable shape in a non-batch dimension. TPU models must have constant shapes for all operations.
You may have to specify 'input_length' for RNN/TimeDistributed layers.
Layer: Input shape: [(None, 128, 768), (None, 1)] Output shape: (None, None, 768)
I'm working with keras-xlnet. As I understand it, TPU needs to have fixed batch size when the model is compiled as explained here and here.
The model is loaded from checkpoint:
from keras_xlnet import Tokenizer, load_trained_model_from_checkpoint,
ATTENTION_TYPE_BI
checkpoint_path = 'xlnet_cased_L-12_H-768_A-12'
tokenizer = Tokenizer(os.path.join(checkpoint_path, 'spiece.model'))
model = load_trained_model_from_checkpoint(
config_path=os.path.join(checkpoint_path, 'xlnet_config.json'),
checkpoint_path=os.path.join(checkpoint_path, 'xlnet_model.ckpt'),
batch_size=BATCH_SIZE,
memory_len=512,
target_len=SEQ_LEN,
in_train_phase=False,
attention_type=ATTENTION_TYPE_BI,
)
model.summary()
model is then compiled (after a few changes):
from keras_bert import AdamWarmup, calc_train_steps
decay_steps, warmup_steps = calc_train_steps(
y_train.shape[0],
batch_size=BATCH_SIZE,
epochs=EPOCHS,
)
model.compile(
AdamWarmup(decay_steps=decay_steps, warmup_steps=warmup_steps, lr=LR),
loss='binary_crossentropy',
)
Then, model is loaded to TPU, where the error occures:
tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
strategy = tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(tpu=tpu_address)
)
with tf.keras.utils.custom_object_scope(get_custom_objects()):
tpu_model = tf.contrib.tpu.keras_to_tpu_model(model, strategy=strategy)
Is there a way I can fix my batch size on compile time to get rid of the error above? Or is the problem something entirely different?