I made a multiprocessing crawler. Below is the simple structure of my code:
class abc:
def all(self):
return "This is abc \n what a abc!\n"
class bcd:
def all(self):
return "This is bcd \n what a bcd!\n"
class cde:
def all(self):
return "This is cde \n what a cde!\n"
class ijk:
def all(self):
return "This is ijk \n what a ijk!\n"
def crawler(sites, ps_queue):
for site in sites:
ps_queue.put(site.all())
messages = ''
def message_collector(ps_queue):
global messages
while True:
message = ps_queue.get()
messages += message
ps_queue.task_done()
def main():
ps_queue = mp.JoinableQueue()
message_collector_proc = mp.Process(
target=message_collector,
args=(ps_queue, )
)
message_collector_proc.daemon = True
message_collector_proc.start()
site_list = [abc(), bcd(), cde(), ijk(), abc()]
crawler(site_list, ps_queue)
ps_queue.join()
print(messages)
if __name__ == "__main__":
main()
Questions about these codes:
1. At the end line of the main()
, there is a code print(messages)
but it doesn't print anything out. Why does it happen?
2. Something hit my head : messages could be screwed up because each process access global variable messages
at the same time. Will lock
be needed? or In this case, each process access messages
in order?
Thanks