2

I'm going to write a program which has multiple process(CPU-crowded) and multiple threading(IO-crowded).(the code below just a sample, not the program)

But when the code meet the join() ,it make the program become a deadlock.

My code is post below

import requests
import time
from multiprocessing import Process, Queue
from multiprocessing.dummy import Pool


start = time.time()
queue = Queue()
rQueue = Queue()
url = 'http://www.bilibili.com/video/av'
for i in xrange(10):
    queue.put(url+str(i))


def goURLsCrawl(queue, rQueue):
    threadPool = Pool(7)
    while not queue.empty():
        threadPool.apply_async(urlsCrawl, args=(queue.get(), rQueue))
    threadPool.close()
    threadPool.join()
    print 'end'


def urlsCrawl(url, rQueue):
    response = requests.get(url)
    rQueue.put(response)


p = Process(target=goURLsCrawl, args=(queue, rQueue))
p.start()
p.join()  # join() is here
end = time.time()
print 'totle time %0.4f' % (end-start,)

Thanks in advance.

zhilian
  • 155
  • 2
  • 11
  • What have you done to tackle the problem? Have you looked at this? https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process.join – Munchhausen Aug 06 '16 at 10:55
  • Wow, that's weird. Removing either line in `urlsCrawl` fixes the deadlock, but why? – Valentin Lorentz Aug 06 '16 at 10:57
  • @Munchhausen Yes, I have seen this before, but I didn't find anything wrong on my code. If I remove the rQueue.put() command it work well and I also find that if the rQueue is empty it also work well.Maybe the problem is relative to Queue.put(). – zhilian Aug 06 '16 at 11:29
  • Have you seen this? http://stackoverflow.com/a/9189249/6647217 – Munchhausen Aug 06 '16 at 11:43
  • @Munchhausen Is that help? I didn't get something helpful. If you `print` the `response` and `rQueue` you will find that everything was done except the non-stop program. – zhilian Aug 06 '16 at 12:15
  • `response` is a complex object possibly tied to a file or other things having problems with pickle, have you tried `rQueue.put(response.text)` (or the like) instead of `rQueue.put(response)`? – janbrohl Aug 06 '16 at 13:26
  • @janbrohl I've tried it just now, and it didn't work. – zhilian Aug 06 '16 at 14:02

1 Answers1

0

I finally find the reason. As you can see, I import the Queue from the multiprocessing, so the Queue should only used for Process, but I make the Thread access the Queue on my code, so it must something unknown occur behind the program.

To correct it, just import Queue instead of multiprocessing.Queue

zhilian
  • 155
  • 2
  • 11