1

I'm trying to train a very basic, small LSTM model in Tensorflow on a GTX 1080 (although I've tried other cards, too). Depending on some parameters (like the hidden state size), I get a ResourceExhaustedError after a pretty regular number of iterations.

The model isn't much more than a an embedding matrix (ca. 5000*300) and a single-layer LSTM, with a final dense projection at every timestep. I have tried with batch sizes as small as 1 and a hidden state size of just 20, but still I run out of memory on an 8G card, with a total training data size of 5M.

I can't wrap my head around why this is happening. I've obviously tried other suggestions to related problems discussed on Stackoverflow, incl. reducing the per_process_gpu_memory_fraction in the TF GPU options, but to no avail.

See code here:

[This doesn't include some utility scripts, so won't run alone. I also deleted some functions for the sake of shortness. The code is designed for multi-task learning, which introduces some overhead here, but the memory problems persist in single-task setups.]

PS: one thing that I know I'm doing not 100% efficiently is storing all training data as a numpy array, then sampling from there and using TF's feed_dict to provide the data to my model. This, I believe, can slow down computation to some degree, but shouldn't cause such severe memory issues, right?

Joko
  • 166
  • 3
  • 17
  • the data stored as numpy would be stored in your RAM, which may or may not be a problem. When training, monitor the RAM with `htop` or some other tool, does the memory fill up? My hypothesis is that the GPU memory fills, not the RAM. With LSTMs, the issue is usually not the number of units *in* the LSTM, since LSTM share parameters, but the output projection layer. You are projecting from how many units to how many units? If say you predict a vocab of size 40k from 128 hidden units from the LSTM, that's a staggering 5.12 million parameters, just one layer. – vega Mar 29 '17 at 15:39
  • 1
    Thanks for your comment, vega! The minimal setup I've tried used 20 hidden units in the LSTMs and 6 output classes, so that shouldn't be the problem. And yes, it's the GPU memory that's filing up. – Joko Mar 29 '17 at 15:45
  • Are you running only one graph at at a time? I would also invite you to post the full code to see if the problem is the code or something else. – vega Mar 29 '17 at 15:47
  • Yeah, just one graph at once. I'll post an edited (reduced) version of my code in a bit! – Joko Mar 29 '17 at 16:43
  • It seems like a partial answer to this problem is hidden in this thread: Tensorflow OOM on GPU Concretely, setting config.gpu_options.allocator_type = 'BFC' lets me run into these problems much more seldom. – Joko Mar 30 '17 at 10:55
  • 1
    Can you output the graph to Tensorboard right after construction so you can be sure the graph looks as expected? As I can see, there is a case when the input is passed directly to LSTM (without embedding to 300d) which after unrolling the LSTM may not fit? – Mihail Burduja Mar 30 '17 at 12:21

0 Answers0