1

I am a beginner when it comes to python threading and multiprocessing so please bear with me.

I want to make a system that consists of three python scripts. The first one creates some data and sends this data to the second script continuously. The second script takes the data and saves on some file until the file exceeds defined memory limit. When that happens, the third script sends the data to an external device and gets rid of this "cache". I need all of this to happen concurrently. The pseudo code sums up what I am trying to do.

def main_1():

  data = [1,2,3]
  send_to_second_script(data)

def main_2():

  rec_data = receive_from_first_script()
  save_to_file(rec_data)
  if file>limit:
     signal_third_script()

def main_3():
  if signal is true:
     send_data_to_external_device()
     remove_data_from_disk()

I understand that I can use queues to make this happen but I am not sure how.

Also , so far to do this, I tried a different approach where I created one python script and used threading to spawn threads for each part of the process. Is this correct or using queues is better?

amro_ghoneim
  • 495
  • 1
  • 4
  • 14
  • Even with threads you'd need queues for thread-to-thread communication. Just a different kind – rdas Apr 27 '20 at 08:53
  • @rdas can you please elaborate? If I understand what you mean, I am currently using mutex when working on shared data. – amro_ghoneim Apr 27 '20 at 08:59
  • 1
    Right, you can replace the mutext with a queue from the `threading` module. The solution will look pretty much the same with processes (you'd need to use `multiprocessing.queue` instead) – rdas Apr 27 '20 at 09:00

1 Answers1

1

Firstly, for Python you need to be really aware what the benefits of multithreading/multiprocessing gives you. IMO you should be considering multiprocessing instead of multithreading. Threading in Python is not actually concurrent due to GIL and there are many explanations out on which one to use. Easiest way to choose is to see if your program is IO-bound or CPU-bound. Anyways on to the Queue which is a simple way to work with multiple processes in python.

Using your pseudocode as an example, here is how you would use a Queue.

import multiprocessing



def main_1(output_queue):
    test = 0
    while test <=10: # simple limit to not run forever
        data = [1,2,3]
        print("Process 1: Sending data")
        output_queue.put(data) #Puts data in queue FIFO
        test+=1
    output_queue.put("EXIT") # triggers the exit clause

def main_2(input_queue,output_queue):
    file = 0 # Dummy psuedo variables
    limit = 1
    while True:
        rec_data = input_queue.get() # Get the latest data from queue. Blocking if empty
        if rec_data == "EXIT": # Exit clause is a way to cleanly shut down your processes
            output_queue.put("EXIT")
            print("Process 2: exiting")
            break
        print("Process 2: saving to file:", rec_data, "count = ", file)
        file += 1
        #save_to_file(rec_data)
        if file>limit:
            file = 0 
            output_queue.put(True)

def main_3(input_queue):
    while(True):
        signal = input_queue.get()

        if signal is True:
            print("Process 3: Data sent and removed")
            #send_data_to_external_device()
            #remove_data_from_disk()
        elif signal == "EXIT":
            print("Process 3: Exiting")
            break

if __name__== '__main__':

    q1 = multiprocessing.Queue() # Intializing the queues and the processes
    q2 = multiprocessing.Queue()
    p1 = multiprocessing.Process(target = main_1,args = (q1,))
    p2 = multiprocessing.Process(target = main_2,args = (q1,q2,))
    p3 = multiprocessing.Process(target = main_3,args = (q2,))

    p = [p1,p2,p3]
    for i in p: # Start all processes
        i.start()
    for i in p: # Ensure all processes are finished
        i.join()

The prints may be a little off because I did not bother to lock the std_out. But using a queue ensures that stuff moves from one process to another.

EDIT: DO be aware that you should also have a look at multiprocessing locks to ensure that your file is 'thread-safe' when performing the move/delete. The pseudo code above only demonstrates how to use queue

Jason Chia
  • 1,144
  • 1
  • 5
  • 18
  • Thank you for your answer! If I want to extend this to have three python files, how do I share the queues between them? – amro_ghoneim Apr 27 '20 at 09:28
  • 1
    What do you mean three python files? You mean you want to split your program into 3 scripts? Queues can be treated as objects so it shouldn't be too difficult once you get the imports done right. – Jason Chia Apr 27 '20 at 09:36
  • I have one extra question. suppose I have two processes that want to access the same shared data at the same time. How can I give priority to one process over the other to lock and use the data first? Does calling this process first enough for doing so? – amro_ghoneim Apr 27 '20 at 10:23
  • 1
    It really depends on why and how you want to give the shared data to two processes. If both processes need to process the same data, I think creating a secondary queue is easier. If the data needs to be available to both processes for them to do something, you can have a look at the multiprocessing manager. https://stackoverflow.com/questions/3671666/sharing-a-complex-object-between-python-processes – Jason Chia Apr 27 '20 at 11:00