16

I'm using Keras with tensorflow as backend. I have one compiled/trained model.

My prediction loop is slow so I would like to find a way to parallelize the predict_proba calls to speed things up. I would like to take a list of batches (of data) and then per available gpu, run model.predict_proba() over a subset of those batches.
Essentially:

data = [ batch_0, batch_1, ... , batch_N ]
on gpu_0 => return predict_proba(batch_0)
on gpu_1 => return predict_proba(batch_1)
...
on gpu_N => return predict_proba(batch_N) 

I know that it's possible in pure Tensorflow to assign ops to a given gpu (https://www.tensorflow.org/tutorials/using_gpu). However, I don't know how this translates to my situation since I've built/compiled/trained my model using Keras' api.

I had thought that maybe I just needed to use python's multiprocessing module and start a process per gpu that would run predict_proba(batch_n). I know this is theoretically possible given another SO post of mine: Keras + Tensorflow and Multiprocessing in Python. However, this still leaves me with the dilemma of not knowing how to actually "choose" a gpu to operate the process on.

My question boils down to: how does one parallelize prediction for one model in Keras across multiple gpus when using Tensorflow as Keras' backend?

Additionally I am curious if similar parallelization for prediction is possible with only one gpu.

A high level description or code example would be greatly appreciated!

Thanks!

Community
  • 1
  • 1
John Cast
  • 1,771
  • 3
  • 18
  • 40

2 Answers2

6

I created one simple example to show how to run keras model across multiple gpus. Basically, multiple processes are created and each of process owns a gpu. To specify the gpu id in process, setting env variable CUDA_VISIBLE_DEVICES is a very straightforward way (os.environ["CUDA_VISIBLE_DEVICES"]). Hope this git repo can help you.

https://github.com/yuanyuanli85/Keras-Multiple-Process-Prediction

VictorLi
  • 211
  • 3
  • 4
1

You can use this function to parallelize a Keras model (credits to kuza55).
https://github.com/kuza55/keras-extras/blob/master/utils/multi_gpu.py
.

from keras.layers import merge
from keras.layers.core import Lambda
from keras.models import Model

import tensorflow as tf

def make_parallel(model, gpu_count):
    def get_slice(data, idx, parts):
        shape = tf.shape(data)
        size = tf.concat([ shape[:1] // parts, shape[1:] ],axis=0)
        stride = tf.concat([ shape[:1] // parts, shape[1:]*0 ],axis=0)
        start = stride * idx
        return tf.slice(data, start, size)

    outputs_all = []
    for i in range(len(model.outputs)):
        outputs_all.append([])

    #Place a copy of the model on each GPU, each getting a slice of the batch
    for i in range(gpu_count):
        with tf.device('/gpu:%d' % i):
            with tf.name_scope('tower_%d' % i) as scope:

                inputs = []
                #Slice each input into a piece for processing on this GPU
                for x in model.inputs:
                    input_shape = tuple(x.get_shape().as_list())[1:]
                    slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)
                    inputs.append(slice_n)                

                outputs = model(inputs)

                if not isinstance(outputs, list):
                    outputs = [outputs]

                #Save all the outputs for merging back together later
                for l in range(len(outputs)):
                    outputs_all[l].append(outputs[l])

    # merge outputs on CPU
    with tf.device('/cpu:0'):
        merged = []
        for outputs in outputs_all:
            merged.append(merge(outputs, mode='concat', concat_axis=0))

        return Model(input=model.inputs, output=merged)
Temak
  • 2,929
  • 36
  • 49
  • 1
    After training and saved the model as json, when reload model from json and prediction, it failed as not know tf, how to fix it or what is the best solution? – bygreencn Jul 14 '17 at 09:52
  • Yes, this is known issue with this code. When you run this function - you change the graph. Thus you need to save only one branch to be able to load it afterwards for prediction on single gpu – Temak Jul 14 '17 at 12:34