Tensorflow, Flask, and TFLearn Memory Leak

Question

I'm running the following program and each time I hit the 'build' API call I see about another 1 GB of memory being taken up after the process completes. I'm trying to eliminate everything from memory but I'm not sure what remains.

import tensorflow as tf
import tflearn
from flask import Flask, jsonify
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression

app = Flask(__name__)

keep_prob = .8
num_labels = 3
batch_size = 64

class AlexNet():

    def __init__(self):

        @app.route('/build')
        def build():
            g = tf.Graph()
            with g.as_default():
                sess = tf.Session()

                # Building 'AlexNet'
                network = input_data(shape=[None, 227, 227, 3])
                network = conv_2d(network, 96, 11, strides=4, activation='relu')
                network = max_pool_2d(network, 3, strides=2)
                network = local_response_normalization(network)
                network = conv_2d(network, 256, 5, activation='relu')
                network = max_pool_2d(network, 3, strides=2)
                network = local_response_normalization(network)
                network = conv_2d(network, 384, 3, activation='relu')
                network = conv_2d(network, 384, 3, activation='relu')
                network = conv_2d(network, 256, 3, activation='relu')
                network = max_pool_2d(network, 3, strides=2)
                network = local_response_normalization(network)
                network = fully_connected(network, 4096, activation='tanh')
                network = dropout(network, keep_prob)
                network = fully_connected(network, 4096, activation='tanh')
                network = dropout(network, keep_prob)
                network = fully_connected(network, num_labels, activation='softmax')
                network = regression(network, optimizer="adam",
                                     loss='categorical_crossentropy',
                                     learning_rate=0.001, batch_size=batch_size)

                model = tflearn.DNN(network, tensorboard_dir="./tflearn_logs/",
                                    checkpoint_path=None, tensorboard_verbose=0, session=sess)

                sess.run(tf.initialize_all_variables())
                sess.close()

            tf.reset_default_graph()

            del g
            del sess
            del model
            del network
            return jsonify(status=200)


if __name__ == "__main__":
    AlexNet()
    app.run(host='0.0.0.0', port=5000, threaded=True)

The memory allocation happens here:sess.run(tf.initialize_all_variables()) — bradden_gross, Jul 31 '16 at 17:48
maybe try `free && sync && echo 3 > /proc/sys/vm/drop_caches && free` — Yaroslav Bulatov, Jul 31 '16 at 18:49
I'm running this locally on a Mac so I'm not sure what the equivalent command is. — bradden_gross, Jul 31 '16 at 19:35
Does the Activity Manager show which process has the memory? If not, is it possible the memory is not really taken? (ie, "free" on linux underestimates the memory that's available for other processes) — Yaroslav Bulatov, Jul 31 '16 at 19:37
The Activity Manager shows the process growing with each run and the process crashes with an OOM exception after long runs. The process using the memory is the Python process in the Activity Manager. — bradden_gross, Jul 31 '16 at 19:40

score 2 · Accepted Answer · edited May 23 '17 at 12:06

I'm not sure if you have found the answer but IMHO, you are not supposed to put long running tasks in the HTTP request handler. Because HTTP is stateless and supposed to respond to the call almost immediately. That's why we have the concept task queues, async tasks etc. The rule of thumb with server side development is responding to the request as quick as possible. And if you try to build a convolutional deep neural network within the HTTP request, it's normal that it's not really feasible. Because ideal HTTP request should respond in a coupe of seconds. Your DNN Classifier session running can take too many seconds (need to try).

The hackiest solution would be creating a python thread within the request and let the request respond to the HTTP call without blocking. Meanwhile your thread can go ahead and builds your model. And then you can write your model somewhere or send a mail notification etc.

Here you go:

How can I add a background thread to flask?

you can also do this by adding a subprocess using `psutil.subprocess` — Krishna Kishore Andhavarapu, Oct 25 '16 at 07:43

Tensorflow, Flask, and TFLearn Memory Leak

1 Answers1