2

I am learning machine translation in Keras using the code from this article. The article's code works fine on GPU and CPU as-is.

Now I want to take advantage of Google Colab TPUs. The code doesn't TPU-ify as-is, I need to move in a TF direction.

Following a Fashion MNIST notbook for TPUs, I use the Keras layer in Tensorflow rather than the other way around. Before getting to the TPU part, I am doing this conversion to see if it still runs on GPU. This means mainly changing this function, from:

from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Embedding
from keras.layers import RepeatVector
from keras.layers import TimeDistributed
# define NMT model
def define_model(src_vocab, tar_vocab, src_timesteps, tar_timesteps, n_units):
    model = Sequential()
    model.add(Embedding(src_vocab, n_units, input_length=src_timesteps, mask_zero=True))
    model.add(LSTM(n_units))
    model.add(RepeatVector(tar_timesteps))
    model.add(LSTM(n_units, return_sequences=True))
    model.add(TimeDistributed(Dense(tar_vocab, activation='softmax')))
    return model

to:

import tensorflow as tf
# define NMT model
def define_model(src_vocab, tar_vocab, src_timesteps, tar_timesteps, n_units):
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Embedding(src_vocab, n_units, input_length=src_timesteps, mask_zero=True))
    model.add(tf.keras.layers.LSTM(n_units))
    model.add(tf.keras.layers.RepeatVector(tar_timesteps))
    model.add(tf.keras.layers.LSTM(n_units, return_sequences=True))
    model.add(tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(tar_vocab, activation='softmax')))
    return model

Then I do

model = define_model(swh_vocab_size, eng_vocab_size, swh_length, eng_length, 256)
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(trainX, trainY, epochs=1, batch_size=64, validation_data=(testX, testY), callbacks=[checkpoint], verbose=2)

However this results in a complaint when I run fit:

lib\site-packages\tensorflow\python\ops\gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

Then during the fit inside the GPU it fails on a BLAS load as follows:

InternalError: Blas GEMM launch failed : a.shape=(64, 256), b.shape=(256, 256), m=64, n=256, k=256
     [[{{node lstm/while/MatMul}} = MatMul[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/lstm/while/strided_slice_grad/StridedSliceGrad"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](lstm/while/TensorArrayReadV3, lstm/while/strided_slice)]]
     [[{{node loss/time_distributed_loss/broadcast_weights/assert_broadcastable/AssertGuard/Assert/Switch/_175}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2728_...ert/Switch", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

This is prior to conversion to TPU model. I'm just trying to make sure things still run on CPU and GPU before getting to final TPU conversion. They don't. Any thoughts on why I can't get this far?

Lars Ericson
  • 1,952
  • 4
  • 32
  • 45

1 Answers1

0

I'm thinking some of this may have to do with careful Anaconda Python install on Windows. Here is I think the correct sequence (assuming as in my case that you have CUDA 9.0 and cuDNN installed already):

Install a version of Visual Studio that matches the one used to build tensorflow, per this question. Add path

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC 

to PATH.

And this: run vcvarsall in script before running Python. Then:

  1. Start a CMD window using Run As Administrator. This is crucial.
  2. conda create --name myenv
  3. conda activate myenv
  4. conda install tensorflow-gpu
  5. conda install mingw
  6. conda install libpython
  7. conda install mkl-service

I will mark this correct later after some more testing. Steps 3 and 4 come from this question and concept of starting from scratch and strictly using conda install rather than pip install from this question.

Lars Ericson
  • 1,952
  • 4
  • 32
  • 45