2

I tracked a python multiprocessing headache down to the import of a module (nltk). Reproducible (hopefully) code is pasted below. This doesn't make any sense to me, does anybody have any ideas?

from multiprocessing import Pool
import time, requests
#from nltk.corpus import stopwords   # uncomment this and it hangs

def gethtml(key, url):
    r = requests.get(url)
    return r.text

def getnothing(key, url):
    return "nothing"

if __name__ == '__main__':
    pool = Pool(processes=4)
    result = list()
    nruns = 4
    url = 'http://davidchao.typepad.com/webconferencingexpert/2013/08/gartners-magic-quadrant-for-cloud-infrastructure-as-a-service.html'
    for i in range(0,nruns):
#        print gethtml(i,url)
        result.append(pool.apply_async(gethtml, [i,url]))
#        result.append(pool.apply_async(getnothing, [i,url]))
    pool.close()

    # monitor jobs until they complete
    running = nruns
    while running > 0:
        time.sleep(1)
        running = 0
        for run in result:
            if not run.ready(): running += 1
        print "processes still running:",running

    # print results
    for i,run in enumerate(result):
        print i,run.get()[0:40]

Note that the 'getnothing' function works. It's a combination of the nltk module import and the requests call. Sigh

> python --version
Python 2.7.6

> python -c 'import sys;print("%x" % sys.maxsize, sys.maxsize > 2**32)'
('7fffffffffffffff', True)

> pip freeze | grep requests
requests==2.2.1

> pip freeze | grep nltk
nltk==2.0.4
Ziggy Eunicien
  • 2,858
  • 1
  • 23
  • 28

1 Answers1

1

I would redirect others with similar problems to solutions which do not use the multiprocessing module:

1) Apache Spark for scalability/flexibility. However, this doesn't seem to a solution for python multiprocessing. Looks like pyspark is also limited by the Global Interpreter Lock?

2) 'gevent' or 'twisted' for general python asynchronous processing http://sdiehl.github.io/gevent-tutorial/

3) grequests for asynchronous requests Asynchronous Requests with Python requests

Community
  • 1
  • 1
Ziggy Eunicien
  • 2,858
  • 1
  • 23
  • 28