1

So as the title says, I have a program that I want to parallelize via multiprocessing. Specifically, I have a dict with an int as the key and a string (it's a URL) as the value.

All I want to do is create a new dict, having done some transformations to the URL of the old dict:

def convert_url(_id, url, sess):   
  im = get_image(url) #get_image is defined elsewhere
  return _id, sess.run(end_points['vgg_16/fc7'], feed_dict={input_tensor: im})

You can see I'm retrieving the image from the URL and then using a tensorflow CNN to transform the image.

However, I can't manage to parallelize this function. Here's what I've tried (assume sess and the other tensorflow stuff is initialized elsewhere):

from multiprocessing import Pool
from functools import partial

with Pool(processes=3) as pool:
  results = pool.starmap(partial(convert_url, sess=sess), some_dict.items()))

What I end up getting is this TypeError:

TypeError: can't pickle _thread.lock objects

(I realize results wouldn't be a dict as written, but it would be trivial to convert it into one. I don't know how to do it otherwise.)

Any help?

anon
  • 407
  • 2
  • 12
  • 1
    `multiprocessing` communicates values to sub-processes using the `pickle` format, but that can't represent many types of objects. Unfortunately it appears a tensorflow session contains such an object, and anyway thread locking doesn't work inter-process. I hope this might give you insight into why you may need a different technique. [This unanswered question](https://stackoverflow.com/questions/36610290/tensorflow-and-multiprocessing-passing-sessions) might expand. – holdenweb Aug 27 '17 at 10:00
  • Ah, thank you. That does help. It appears that one solution is giving each thread its own tensorflow session. I'm not sure how to do that but I'm going to look into that, as it would work fine. – anon Aug 27 '17 at 20:12

0 Answers0