0

I made a multiprocessing crawler. Below is the simple structure of my code:

class abc:
    def all(self):
        return "This is abc \n what a abc!\n"

class bcd:
    def all(self):
        return "This is bcd \n what a bcd!\n"

class cde:
    def all(self):
        return "This is cde \n what a cde!\n"

class ijk:
    def all(self):
        return "This is ijk \n what a ijk!\n"


def crawler(sites, ps_queue):
    for site in sites:
        ps_queue.put(site.all())

messages = ''
def message_collector(ps_queue):
    global messages
    while True:
        message = ps_queue.get()
        messages += message
        ps_queue.task_done()

def main():
    ps_queue = mp.JoinableQueue()
    message_collector_proc = mp.Process(
        target=message_collector,
        args=(ps_queue, )
    )
    message_collector_proc.daemon = True
    message_collector_proc.start()

    site_list = [abc(), bcd(), cde(), ijk(), abc()]
    crawler(site_list, ps_queue)
    ps_queue.join()

    print(messages)

if __name__ == "__main__":
    main()

Questions about these codes: 1. At the end line of the main(), there is a code print(messages) but it doesn't print anything out. Why does it happen? 2. Something hit my head : messages could be screwed up because each process access global variable messages at the same time. Will lock be needed? or In this case, each process access messages in order?

Thanks

user3595632
  • 5,380
  • 10
  • 55
  • 111

1 Answers1

0

when you call mp.Process(target=message_collector, args=(ps_queue, )) you open a new process, in which a new messages object is created, it is updated correctly, but when you try to print it in the main process it is empty, since it is not the same messages object you added data to. you need another synchronization object so that you can share messages between processes, see How to share a string amongst multiple processes using Managers() in Python?

here's a simple example using a Manager.Dict object (not the most elegant or efficient solution but definitely the easiest)

import multiprocessing as mp

class abc:
    def all(self):
        return "This is abc \n what a abc!\n"

class bcd:
    def all(self):
        return "This is bcd \n what a bcd!\n"

class cde:
    def all(self):
        return "This is cde \n what a cde!\n"

class ijk:
    def all(self):
        return "This is ijk \n what a ijk!\n"

manager = mp.Manager()
messages_dict = manager.dict()

def crawler(sites, ps_queue):
    for site in sites:
        ps_queue.put(site.all())

def message_collector(ps_queue):
    messages = ''
    while True:
        message = ps_queue.get()
        messages += message
        messages_dict['value'] = messages
        ps_queue.task_done()

def main():
    ps_queue = mp.JoinableQueue()
    message_collector_proc = mp.Process(
        target=message_collector,
        args=(ps_queue, )
    )
    message_collector_proc.daemon = True
    message_collector_proc.start()

    site_list = [abc(), bcd(), cde(), ijk(), abc()]
    crawler(site_list, ps_queue)
    ps_queue.join()

    print(messages_dict['value'])
AntiMatterDynamite
  • 1,495
  • 7
  • 17