I have a function that processes one url at a time:
def sanity(url):
try:
if 'media' in url[:10]:
url = "http://dummy.s3.amazonaws.com" + url
req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"})
ret = urllib.request.urlopen(req)
allurls.append(url)
return 1
except (urllib.request.HTTPError,urllib.request.URLError,http.client.HTTPException, ValueError) as e:
print(e, url)
allurls.append(url)
errors.append(url)
return 0
In the main function, I have a list of URLs that need to be processed by the above function. I have tried but doesn't work.
start=0
allurls=[]
errors=[]
#arr=[0,100,200...]
for i in arr:
p=Process(target=sanity,args=(urls[start:i],))
p.start()
p.join()
The above code is supposed to process the URLs in a batch of 100. But it doesn't work. I know it's not working because I am writing the lists allurls
and errors
to two different files and they are empty when they should not be. I have found that the lists are empty. I don't understand this behavior.