Tensorflow on GPU uses too much memory

Question

I'm trying to train a very basic, small LSTM model in Tensorflow on a GTX 1080 (although I've tried other cards, too). Depending on some parameters (like the hidden state size), I get a ResourceExhaustedError after a pretty regular number of iterations.

The model isn't much more than a an embedding matrix (ca. 5000*300) and a single-layer LSTM, with a final dense projection at every timestep. I have tried with batch sizes as small as 1 and a hidden state size of just 20, but still I run out of memory on an 8G card, with a total training data size of 5M.

I can't wrap my head around why this is happening. I've obviously tried other suggestions to related problems discussed on Stackoverflow, incl. reducing the per_process_gpu_memory_fraction in the TF GPU options, but to no avail.

See code here:

https://pastebin.com/1MSUrt15 (main training script)
https://pastebin.com/U1tYbM8A (model definition)

[This doesn't include some utility scripts, so won't run alone. I also deleted some functions for the sake of shortness. The code is designed for multi-task learning, which introduces some overhead here, but the memory problems persist in single-task setups.]

PS: one thing that I know I'm doing not 100% efficiently is storing all training data as a numpy array, then sampling from there and using TF's feed_dict to provide the data to my model. This, I believe, can slow down computation to some degree, but shouldn't cause such severe memory issues, right?

the data stored as numpy would be stored in your RAM, which may or may not be a problem. When training, monitor the RAM with `htop` or some other tool, does the memory fill up? My hypothesis is that the GPU memory fills, not the RAM. With LSTMs, the issue is usually not the number of units *in* the LSTM, since LSTM share parameters, but the output projection layer. You are projecting from how many units to how many units? If say you predict a vocab of size 40k from 128 hidden units from the LSTM, that's a staggering 5.12 million parameters, just one layer. — vega, Mar 29 '17 at 15:39
Thanks for your comment, vega! The minimal setup I've tried used 20 hidden units in the LSTMs and 6 output classes, so that shouldn't be the problem. And yes, it's the GPU memory that's filing up. — Joko, Mar 29 '17 at 15:45
Are you running only one graph at at a time? I would also invite you to post the full code to see if the problem is the code or something else. — vega, Mar 29 '17 at 15:47
Yeah, just one graph at once. I'll post an edited (reduced) version of my code in a bit! — Joko, Mar 29 '17 at 16:43
It seems like a partial answer to this problem is hidden in this thread: Tensorflow OOM on GPU Concretely, setting config.gpu_options.allocator_type = 'BFC' lets me run into these problems much more seldom. — Joko, Mar 30 '17 at 10:55
Can you output the graph to Tensorboard right after construction so you can be sure the graph looks as expected? As I can see, there is a case when the input is passed directly to LSTM (without embedding to 300d) which after unrolling the LSTM may not fit? — Mihail Burduja, Mar 30 '17 at 12:21

Tensorflow on GPU uses too much memory

0 Answers0