6

Background

I want to predict pathology images using keras with Inception-Resnet_v2. I have trained the model already and got a .hdf5 file. Because the pathology image is very large (for example: 20,000 x 20,000 pixels), so I have to scan the image to get small patches for prediction.

I want to speed up the prediction procedure using multiprocessing lib with python2.7. The main idea is using different subprocesses to scan different lines and then sending patches to model.

I saw somebody suggests importing keras and loading model in subprocesses. But I don't think it is suitable for my task. Loading model usingkeras.models.load_model() one time will take about 47s, which is very time-consuming. So I can't reload the model every time when I start a new subprocess.

Question

My question is can I load the model in my main process and pass it as a parameter to subprocesses?

I have tried two methods but both of them didn't work.

Method 1. Using multiprocessing.Pool

The code is :

import keras
from keras.models import load_model
import multiprocessing

def predict(num,model):
    print dir(model)
    print num
    model.predict("image data, type:list")

if __name__ == '__main__':
    model = load_model("path of hdf5 file")
    list = [(1,model),(2,model),(3,model),(4,model),(5,model),(6,model)]
    pool = multiprocessing.Pool(4)
    pool.map(predict,list)
    pool.close()
    pool.join()

The output is

cPickle.PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed

I searched the error and found Pool can't map unpickelable parameters, so I try method 2.

Method 2. Using multiprocessing.Process

The code is

import keras
from keras.models import load_model
import multiprocessing

def predict(num,model):
    print num
    print dir(model)
    model.predict("image data, type:list")

if __name__ == '__main__':
    model = load_model("path of hdf5 file")
    list = [(1,model),(2,model),(3,model),(4,model),(5,model),(6,model)]
    proc = []
    for i in range(4):
        proc.append(multiprocessing.Process(predict, list[i]))
        proc[i].start()
    for i in range(4):
        proc[i].join()

In Method 2, I can print dir(model). I think it means the model is passed to subprocesses successfully. But I got this error

E tensorflow/stream_executor/cuda/cuda_driver.cc:1296] failed to enqueue async memcpy from host to device: CUDA_ERROR_NOT_INITIALIZED; GPU dst: 0x13350b2200; host src: 0x2049e2400; size: 4=0x4

The environment which I use:

  • Ubuntu 16.04, python 2.7
  • keras 2.0.8 (tensorflow backend)
  • one Titan X, Driver version 384.98, CUDA 8.0

Looking forward to reply! Thanks!

talonmies
  • 70,661
  • 34
  • 192
  • 269
Eason Yang
  • 61
  • 1
  • 2
  • Have you ever solved this problem? Facing the same pickling problem here. Using pure Process instead of a Pool made the process hang instead of failing to pickle. However I am not sure if that is a progress at all. – Eduardo Oct 27 '18 at 16:40

3 Answers3

0

Maybe you can use apply_async() instead of Pool()

and you can find more details here:

Python multiprocessing pickling error

Statham
  • 4,000
  • 2
  • 32
  • 45
0

Multi-processing works on CPU, while model prediction happened in GPU, which there is only one. I cannot see how multi-processing can help you on prediction.

Instead, I think you can use multi-processing to scan different patches, which you seems to have already managed to achieve. Then stack these patches into a batch or batches to predict in parallel in GPU.

Diansheng
  • 1,081
  • 12
  • 19
0

As noted by Statham multiprocess requires all args to be compatible with pickle. This blog post describes how to save a keras model as a pickle: [http://zachmoshe.com/2017/04/03/pickling-keras-models.html][1] It may be a sufficient workaround to get your keras model passed as an arg to multiprocess, but I have not tested the idea myself.

I will also add that I had better luck running two keras processes on a single gpu using windows rather than linux. On linux I was getting out of memory errors on the 2nd process, but the same memory allocation (45% of total GPU ram for each) worked on windows. In my case they were fits - for running predictions only, maybe the memory requirements are less.

J B
  • 348
  • 1
  • 6