I've been on this site a while and I've found so many helpful solutions to the problems I've encountered as I build my first python program. I'm hopeful you guys can help me once again.
I am trying to launch a variable number of multiprocesses, with each one taking a small piece of a list to scan. I have been tinkering with queues, but when I implement them, they always add a sizable amount of time to my loop. I am looking to maximize my speed while protecting my Titles.txt from erroneous contents. Let me show you my code.
l= ['url1', 'url2', etc]
def output(t):
f = open('Titles.txt','a')
f.write(t)
f.close()
def job(y,processload):
calender = ['Jan', 'Feb', 'Mar', 'Dec'] #the things i want to find
for i in range(processload): #looping processload times
source = urllib.request.urlopen(l[y]).read() #read url #y
soup = bs.BeautifulSoup(source,'lxml')
for t in soup.html.head.find_all('title'):
if any(word in t for word in calender):
output(t) #this what i need to queue
y+=1 #advance url by 1
if __name__ == '__main__':
processload=5 #the number of urls to be scanned by job
y=0 #the specific count of url in list
runcount = 0
while runcount == 0: #engage loop
for i in range(380/processload): #the list size / 5
p= multiprocessing.Process(target=job, args=(y,processload)
p.start()
y+=processload #jump y ahead
The code above allows for maximum speed in my loop. I would like to preserve the speed while also protecting my output. I have been searching through examples, but I haven't found code yet that features a lock or queue started in a child process. How would you recommend I proceed?
Thank you very much.