1

I'm working on an optimization problem, and you can see a simplified version of my code posted below (the origin code is too complicated for asking such a question, and I hope my simplified code has simulated the original one as much as possible).

My purpose: use the function foo in the function optimization, but foo can take very long time due to some hard situations. So I use multiprocessing to set a time limit for execution of the function (proc.join(iter_time), the method is from an anwser from this question; How to limit execution time of a function call?).

My problem:

  1. In the while loop, every time the generated values for extra are the same.
  2. The list lst's length is always 1, which means in every iteration in the while loop it starts from an empty list.

My guess: possible reason can be each time I create a process the random seed is counting from the beginning, and each time the process is terminated, there could be some garbage collection mechanism to clean the memory the processused, so the list is cleared.

My question

  1. Anyone know the real reason of such problems?
  2. if not using multiprocessing, is there anyway else that I can realize my purpose while generate different random numbers? btw I have tried func_timeout but it has other problems that I cannot handle...
random.seed(123)
lst = []  # a global list for logging data

def foo(epoch):
    ...
    extra = random.random()
    lst.append(epoch + extra)
    ...

def optimization(loop_time, iter_time):
    start = time.time()
    epoch = 0
    while time.time() <= start + loop_time:
        proc = multiprocessing.Process(target=foo, args=(epoch,))
        proc.start()
        proc.join(iter_time)
        if proc.is_alive():  # if the process is not terminated within time limit
            print("Time out!")
            proc.terminate()

if __name__ == '__main__':
    optimization(300, 2)
martineau
  • 119,623
  • 25
  • 170
  • 301
kaiyu wei
  • 431
  • 2
  • 5
  • 14
  • 2
    Not related to the problem, but there are a couple of errors in your code: (1) `args=(epoch)` should say `args=(epoch,)` to convert it to tuple; (2) `if __name__ = '__main__'` should be `if __name__ == '__main__'`; (3) `while time.time() - start <= start + loop_time` won't finish for a very long time: I think you mean `while time.time() <= start + loop_time` – The Thonnu Jul 10 '22 at 17:04
  • 1
    @TheThonnu. XD thanks I'll correct them! – kaiyu wei Jul 10 '22 at 17:22

1 Answers1

1

You need to use shared memory if you want to share variables across processes. This is because child processes do not share their memory space with the parent. Simplest way to do this here would be to use managed lists and delete the line where you set a number seed. This is what is causing same number to be generated because all child processes will take the same seed to generate the random numbers. To get different random numbers either don't set a seed, or pass a different seed to each process:

import time, random
from multiprocessing import Manager, Process

def foo(epoch, lst):
    extra = random.random()
    lst.append(epoch + extra)

def optimization(loop_time, iter_time, lst):
    start = time.time()
    epoch = 0
    while time.time() <= start + loop_time:
        proc = Process(target=foo, args=(epoch, lst))
        proc.start()
        proc.join(iter_time)
        if proc.is_alive():  # if the process is not terminated within time limit
            print("Time out!")
            proc.terminate()
    print(lst)

if __name__ == '__main__':
    manager = Manager()
    lst = manager.list()
    optimization(10, 2, lst)

Output

[0.2035898948744943, 0.07617925389396074, 0.6416754412198231, 0.6712193790613651, 0.419777147554235, 0.732982735576982, 0.7137712131028766, 0.22875414425414997, 0.3181113880578589, 0.5613367673646847, 0.8699685474084119, 0.9005359611195111, 0.23695341111251134, 0.05994288664062197, 0.2306562314450149, 0.15575356275408125, 0.07435292814989103, 0.8542361251850187, 0.13139055891993145, 0.5015152768477814, 0.19864873743952582, 0.2313646288041601, 0.28992667535697736, 0.6265055915510219, 0.7265797043535446, 0.9202923318284002, 0.6321511834038631, 0.6728367262605407, 0.6586979597202935, 0.1309226720786667, 0.563889613032526, 0.389358766191921, 0.37260564565714316, 0.24684684162272597, 0.5982042933298861, 0.896663326233504, 0.7884030244369596, 0.6202229004466849, 0.4417549843477827, 0.37304274232635715, 0.5442716244427301, 0.9915536257041505, 0.46278512685707873, 0.4868394190894778, 0.2133187095154937]

Keep in mind that using managers will affect performance of your code. Alternate to this, you could also use multiprocessing.Array, which is faster than managers but is less flexible in what data it can store, or Queues as well.

Charchit Agarwal
  • 2,829
  • 2
  • 8
  • 20
  • Thanks it works! I deleted the code setting random seed, and use the shared memory and it works well. But one more question: since the ```lst``` in my code is a global list, and it does not belong to any of the child processes, why different child processes cannot share it? – kaiyu wei Jul 10 '22 at 19:49
  • @kaiyuwei All code a parent has outside the `if __name__...` clause will be run again inside the child process because creating a child process involves importing the main script each time. Making `lst` global in the parent process will therefore have no effect in other processes, they simply don't share the same memory space and have their own version of `lst` in their space. Think about it this way, you are essentially running two different scripts, will changes in one propagate to another automatically? If you want to communicate between processes, you need to use shared memory. – Charchit Agarwal Jul 11 '22 at 08:16