-1

I am programming a relatively small LSTM model in Google Collab.

For reference I am using TensorFlow 1.13 to build the model, using tensorflow.keras for the keras API.

seq_len = 20000; n_classes = 4
inputs = ll.Input(shape=(seq_len,))
x = ll.Embedding(len(word_index), 1000)(inputs)
x = ll.LSTM(units=100, activation='relu', return_sequences=True)(x)
outputs = ll.Dense(units = n_classes, activation='softmax')(x)
model = Model(inputs, outputs)
model.summary()

I have checked that I have 15 GB of GPU RAM available, and according to my estimations the model with a batch size of 32 should fit in 3GB of RAM.

However, whenever I launch the training the server runs out of memory.

To be fair, I am using extremely long sequences of data (20000 is the maximum sequence length) but I would expect the model to unroll symbolically in memory and just fit in.

Reducing the batch size to 1 does not help either.

What is going on? How can I make this model fit in memory?

EDIT: I tried reducing the sequence length to 2 and that indeed makes it fit in memory. But I need the sequence length to remain high. How can I tell Tensorflow to not unroll the network at any point? (I suspect that is what is going on behind the scenes, how can I check if this is indeed the case?)

EDIT: If I remove the Softmax layer then the memory use drops to the normal range again. I think that the Softmax layer is causing Tensorflow to unroll the network. TimeDistributing the Softmax does not help though.

Jsevillamol
  • 2,425
  • 2
  • 23
  • 46
  • please include the output of model.summary(), the number of parameters is probably huge. Experiment with different sequence lengths and see the change in number of parameters. – Dr. Snoopy Apr 25 '19 at 16:53
  • @MatiasValdenegro the number of parameters is quite within the range of what I usually work with: `1,474,404` parameters – Jsevillamol Apr 26 '19 at 09:01

1 Answers1

1

Changing the LSTM layer for the CuDNNLSTM layer did the trick!

inputs = ll.Input(shape=(seq_len,))
x = ll.Embedding(len(word_index), 1024)(inputs)
x = ll.CuDNNLSTM(units=100, return_sequences=True)(x)
x = ll.Dense(units = n_classes, activation='softmax')(x)
outputs = x
model = Model(inputs, outputs)
Jsevillamol
  • 2,425
  • 2
  • 23
  • 46