2

Sorry, if headline is strange. Let me explain.

Let's say there is handler.py:

import funcs
import requests

def initialize_calculate(data):
   check_data(data)
   funcs.calculate(data) # takes a lot of time like 30 minutes
   print('Calculation launched')
   requests.get('hostname', params={'func':'calculate', 'status':'launched'})

and here is funcs.py:

import requests

def calculate(data):
   result = make_calculations(data)
   requests.get('hostname',params={'func':'calculate', 'status':'finished', 'result':result})

So what I want is that handler can initialize another function no matter where, but doesn't wait until it ends, because I want to notify client-side that process is started, and when it's done this process itself will send result when it's finished.

How can I launch independent process with function calculate from initialize_calculate?

I want to know If it's possible without non-native libraries or frameworks.

Darkonaut
  • 20,186
  • 7
  • 54
  • 65
Archirk
  • 427
  • 7
  • 25
  • 1
    I think what you want is remote procedure call (RPC). Maybe this helps: https://stackoverflow.com/questions/1879971/what-is-the-current-choice-for-doing-rpc-in-python – Wups Oct 16 '20 at 21:52
  • If third-party solution is fine for you, then you might want to take a look on celery https://docs.celeryproject.org/en/stable/ – ihoryam Oct 21 '20 at 11:45
  • 2
    if on the other hand you don't want a third-party for that then you can use either `threading` `asyncio` or `multiprocessing` check out https://realpython.com/python-concurrency/ it contains samples of each of those – ihoryam Oct 21 '20 at 11:54
  • Which OS? "...another function no matter where". Do you mean on another machine or just another file on the same node which should run as independent process? – Darkonaut Oct 21 '20 at 14:35
  • @Darkonaut file. – Archirk Oct 21 '20 at 17:09
  • And the parent-process should stay alive or should exit before the child is finished? Please specify your operating system. – Darkonaut Oct 21 '20 at 17:36
  • 1
    @Darkonaut ubuntu, parent should exit before child finishes – Archirk Oct 22 '20 at 18:05

2 Answers2

1

you may use Process class from multiprocessing module to do that.

Here is an example:

from multiprocessing import Process
import requests

def calculate(data):
    result = make_calculations(data)
    requests.get('hostname',params={'func':'calculate', 'status':'finished', 'result':result})

def initialize_calculate(data):
    check_data(data)
    p = Process(target=calculate, args=(data,))
    p.start()
    print('Calculation launched')
    requests.get('hostname', params={'func':'calculate', 'status':'launched'})
Tibebes. M
  • 6,940
  • 5
  • 15
  • 36
  • It works in __name__=='__main__'. Else it throws :RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. I can't call initilize_calculate from another module. But your solutions is the closest to what i seek – Archirk Oct 21 '20 at 17:16
  • I am getting following two errors: `NameError: name 'check_data' is not defined` and `requests.exceptions.MissingSchema: Invalid URL 'hostname': No schema supplied. Perhaps you meant http://hostname? ` – alper Aug 07 '21 at 23:12
  • @alper the answer is written based on the snippet the OP has given. (I.e, it is assumed that you have a custom function named `check_data` that you have created and `hostname` is your `URL` - you are supposed to replace that) – Tibebes. M Aug 08 '21 at 05:19
  • Can main process wait subprocess for `calculate ` to finish and continue when the subprocess return? – alper Aug 08 '21 at 09:53
1

If you don't wan't to use a 3rd-party lib like daemonocle implementing a "well-behaved" Unix-Daemon, you could use subprocess.Popen() to create an independent process. Another option would be to modify multiprocessing.Process to prevent auto-joining of the child when the parent exits.


subprocess.Popen()

With subprocess.Popen() you start the new process with specifying commands and arguments like manually from terminal. This means you need to make funcs.py or another file a top-level script which parses string-arguments from stdin and then calls funcs.calculate() with these arguments.

I boiled your example down to the essence so we don't have to read too much code.

funcs.py

#!/usr/bin/env python3
# UNIX: enable executable from terminal with: chmod +x filename
import os
import sys
import time

import psutil  # 3rd party for demo


def print_msg(msg):
    print(f"[{time.ctime()}, pid: {os.getpid()}] --- {msg}")


def calculate(data, *args):
    print_msg(f"parent pid: {psutil.Process().parent().pid}, start calculate()")
    for _ in range(int(500e6)):
        pass
    print_msg(f"parent pid: {psutil.Process().parent().pid}, end calculate()")


if __name__ == '__main__':

    if len(sys.argv) > 1:
        calculate(*sys.argv[1:])

subp_main.py

#!/usr/bin/env python3
# UNIX: enable executable from terminal with: chmod +x filename
if __name__ == '__main__':

    import time
    import logging
    import subprocess
    import multiprocessing as mp

    import funcs

    mp.log_to_stderr(logging.DEBUG)

    filename = funcs.__file__
    data = ("data", 42)

    # in case filename is an executable you don't need "python" before `filename`:
    subprocess.Popen(args=["python", filename, *[str(arg) for arg in data]])
    time.sleep(1)  # keep parent alive a bit longer for demo
    funcs.print_msg(f"exiting")

And important for testing, run from terminal, e.g. not PyCharm-Run, because it won't show what the child prints. In the last line below you see the child process' parent-id changed to 1 because the child got adopted by systemd (Ubuntu) after the parent exited.

$> ./subp_main.py
[Fri Oct 23 20:14:44 2020, pid: 28650] --- parent pid: 28649, start calculate()
[Fri Oct 23 20:14:45 2020, pid: 28649] --- exiting
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] running the remaining "atexit" finalizers
$> [Fri Oct 23 20:14:54 2020, pid: 28650] --- parent pid: 1, end calculate()

class OrphanProcess(multiprocessing.Process)

If you search for something more convenient, well you can't use the high-level multiprocessing.Process as is, because it doesn't let the parent process exit before the child, as you asked for. Regular child-processes are either joined (awaited) or terminated (if you set the daemon-flag for Process) when the parent shuts down. This still happens within Python. Note that the daemon-flag doesn't make a process a Unix-Daemon. The naming is a somewhat frequent source of confusion.

I subclassed multiprocessing.Process to switch the auto-joining off and spend some time with the source and observing if zombies might become an issue. Because the modification turns off automatic joining in the parent, I recommend using "forkserver" as start-method for new processes on Unix (always a good idea if the parent is already multi-threaded) to prevent zombie-children from sticking around as long the parent is still running. When the parent process terminates, its child-zombies get eventually reaped by systemd/init. Running multiprocessing.log_to_stderr() shows everything shutting down cleanly, so nothing seems broken so far.

Consider this approach experimental, but it's probably a lot safer than using raw os.fork() to re-invent part of the extensive multiprocessing machinery, just to add this one feature. For error-handling in the child, write a try-except block and log to file.

orphan.py

import multiprocessing.util
import multiprocessing.process as mpp
import multiprocessing as mp

__all__ = ['OrphanProcess']


class OrphanProcess(mp.Process):
    """Process which won't be joined by parent on parent shutdown."""
    def start(self):
        super().start()
        mpp._children.discard(self)

    def __del__(self):
        # Finalizer won't `.join()` the child because we discarded it,
        # so here last chance to reap a possible zombie from within Python.
        # Otherwise systemd/init will reap eventually.
        self.join(0)

orph_main.py

#!/usr/bin/env python3
# UNIX: enable executable from terminal with: chmod +x filename
if __name__ == '__main__':

    import time
    import logging
    import multiprocessing as mp
    from orphan import OrphanProcess
    from funcs import print_msg, calculate

    mp.set_start_method("forkserver")
    mp.log_to_stderr(logging.DEBUG)

    p = OrphanProcess(target=calculate, args=("data", 42))
    p.start()
    time.sleep(1)
    print_msg(f"exiting")

Again test from terminal to get the child print to stdout. When the shell appears to be hanging after everything was printed over the second prompt, hit enter to get a new prompt. The parent-id stays the same here because the parent, from the OS-point of view, is the forkserver-process, not the initial main-process for orph_main.py.

$> ./orph_main.py
[INFO/MainProcess] created temp directory /tmp/pymp-bd75vnol
[INFO/OrphanProcess-1] child process calling self.run()
[Fri Oct 23 21:18:29 2020, pid: 30998] --- parent pid: 30997, start calculate()
[Fri Oct 23 21:18:30 2020, pid: 30995] --- exiting
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[DEBUG/MainProcess] running the remaining "atexit" finalizers
$> [Fri Oct 23 21:18:38 2020, pid: 30998] --- parent pid: 30997, end calculate()
[INFO/OrphanProcess-1] process shutting down
[DEBUG/OrphanProcess-1] running all "atexit" finalizers with priority >= 0
[DEBUG/OrphanProcess-1] running the remaining "atexit" finalizers
[INFO/OrphanProcess-1] process exiting with exitcode 0
Darkonaut
  • 20,186
  • 7
  • 54
  • 65
  • Here `subprocess.Popen()` can we wait its output to return and continue after subprocess is completed? – alper Aug 07 '21 at 23:36
  • @alper I think you're looking for [`Popen.wait()`](https://docs.python.org/3/library/subprocess.html#subprocess.Popen.wait). – Darkonaut Aug 07 '21 at 23:50
  • Thanks ` output, error = p.communicate(); p.wait()` was what I was looking for. On `orph_main.py` and `subp_main.py` is it recommended to keep `imports` inside the `__main__` or could we use them on top of the file – alper Aug 08 '21 at 16:00
  • @alper Sure you can put the imports at the top of the file. I just chose to put them below there to emphasize that I only need them loaded when the file gets run as top-level script, which here is the only use case that makes sense since there are no definitions above `'__main__'` one might want to import from _another_ file. – Darkonaut Aug 08 '21 at 17:32
  • Would it be possible to send bytes variable inside the `*args` instead of string? I am having `TypeError: expected str, bytes or os.PathLike object, not int` error – alper Aug 09 '21 at 10:40
  • I think we have to use `*_args` instead of `args` name . `_args[idx]` return valid valud but `args` returns `data = 'funcname' _args = ("")`. I had to do `_args = make_tuple(str(args))` which can fetch items as _args[0] and so on – alper Aug 09 '21 at 12:00