4

Is it possible to use Pool.map() on a function that contains an empty dictionary as one of its arguments? I am new to multiprocessing and want to parallise a web-scraping function. I tried following the example from this site however it doesn't include a dictionary as one of the arguments. The multiprocess function works (it prints out the search result), however it does not append to the dictionary, after completing the process the dictionary is still empty. Looks like I have to use Manager() however I don't know how to implement it. use of Manager() Thanks for help.

from functools import partial
from multiprocessing import Pool
from bs4 import BeautifulSoup as soup

count = 1
outerDict = dict()
emptyList = []
lstOfItems = ['Valsartan','Estrace','Norvasc','Combivent',
'Fluvirin','Kariva','Natrl','Foxamax','Vilanterol','Catapres']

def process_search():
     '''a function that scrapes a site; the outerDict and emptyLst will
become populated as it scrapes the site for each item'''

def callSrch(item,outerDict,emptyList,count):
    searchlink = 'http://www.asite.com'
    uClient=ureq(searchlink+item)
    pagehtml = uClient.read()
    soupPage_ = soup(pagehtml,'html.parser')
    process_search(item,soupPage_,outerDict,count,emptyList)

with Pool() as p:
    prfx = partial(callSrch,outerDict=outerDict,emptyList=emptyList,count=count)
    p.map(prfx, lstOfItems)
Spencer Trinh
  • 743
  • 12
  • 31
  • That `partial` seems wrong. It passes `outerDict` as first argument, whereas `callSrch`'s first arg is `item`. I.e., when using positional arguments with `partial`, they must come in exactly that order, beginning from the first, in the wrapped function. – Jeronimo Aug 31 '18 at 06:05
  • Thanks for spotting that, 'item' should be from 'lstOfItems'. How should that be constructed if 'lstOfItems' is passed in as a list into p.map? How can I feed x items at a time? – Spencer Trinh Aug 31 '18 at 22:57
  • Side-note: [Don't use `functools.partial` to bind arguments for `multiprocessing.Pool.map` and the like](https://stackoverflow.com/q/35062087/364696), as it increases the overhead of task dispatch (not a lot in this case, but `def`ing a simple global function with default arguments would avoid all of it). In this case, it doesn't even do you any good; you seem to expect `outerDict` and `emptyList` to be populated in the parent process, but that won't happen; *copies* of each will be populated in the child, then thrown away. – ShadowRanger Sep 05 '18 at 02:44
  • ok thanks for the info. I simply copied what I found on stackoverflow, however seems like it doesn't apply in my case, as you said. How would I pass multiple arguments to the `.map()` function without `partial` then? I'm not sure what you mean by defining a global function either. Do you mean type `global` for each of those arguments within the function? – Spencer Trinh Sep 05 '18 at 02:56

0 Answers0