Why does forking makes the parent process run the whole script again instead of continuing at fork() location?

Question

I have a fairly simple python script, reduced to the following for this question:

#!/usr/bin/env python3
import os

print(os.getpid(), "a")

def main():
    n_children = 10
    n_it = 1
    children = []
    for x in range(n_children):
        print(os.getpid(), "b", "x=", x)
        pid = os.fork()
        if pid == 0:
            for i in range(n_it):
                pass
            exit(0)
        children.append(pid)

    print("started %d children doing %d iterations each"%(n_children, n_it))

    err = 0
    for pid in children:
        r = os.wait()
        if r[1] != 0:
            print("wait", r)
            err += 1

    if err == 0:
        print("ok")
    else:
        print("err")
        exit(1)

if __name__ == '__main__':
    print(os.getpid(), __name__, "start")
    main() 
    print(os.getpid(), __name__, "end")

If I run it from my shell, or from a script I get this expected output:

16047 a
16047 __main__ start
16047 b x= 0
16047 b x= 1
16047 b x= 2
16047 b x= 3
16047 b x= 4
16047 b x= 5
16047 b x= 6
16047 b x= 7
16047 b x= 8
16047 b x= 9
started 10 children doing 1 iterations each
ok
16047 __main__ end

So far so good. Now if I run it from a xfstests script (so AFAIK called throught a series of shell scripts) I get a very strange behaviour where it seems forking makes the parent process start over... I cannot reproduce it outside of xfstest. It is unexplicable.

QA output created by 002
/home/aaptel/prog/xfstests-git/xfstests/tests/cifs/optest.py -u //localhost/test -o username=aaptel,password=aaptel,noperm,seal,vers=3.0,sec=ntlmsspi,mfsymlinks,actimeo=0,nosharesock -c 10 -i 1 /mnt/test /mnt/oplock
15911 a
15911 __main__ start
15911 b x= 0
15911 a
15911 __main__ start
15911 b x= 0
15911 b x= 1
15911 a
15911 __main__ start
15911 b x= 0
15911 b x= 1
15911 b x= 2
15911 a
15911 __main__ start
15911 b x= 0
15911 b x= 1
15911 b x= 2
15911 b x= 3
15911 b x= 4
15911 b x= 5
15911 b x= 6
15911 b x= 7
15911 a
15911 __main__ start
15911 b x= 0
15911 b x= 1
15911 b x= 2
15911 b x= 3
15911 b x= 4
15911 a
15911 __main__ start
15911 b x= 0
15911 b x= 1
15911 b x= 2
15911 b x= 3
15911 a
15911 __main__ start
15911 b x= 0
15911 b x= 1
15911 b x= 2
15911 b x= 3
15911 b x= 4
15911 b x= 5
15911 b x= 6
15911 b x= 7
15911 b x= 8
15911 b x= 9
15911 a
15911 __main__ start
15911 b x= 0
15911 b x= 1
15911 b x= 2
15911 b x= 3
15911 b x= 4
15911 b x= 5
15911 b x= 6
15911 a
15911 __main__ start
15911 b x= 0
15911 b x= 1
15911 b x= 2
15911 b x= 3
15911 b x= 4
15911 b x= 5
15911 a
15911 __main__ start
15911 b x= 0
15911 b x= 1
15911 b x= 2
15911 b x= 3
15911 b x= 4
15911 b x= 5
15911 b x= 6
15911 b x= 7
15911 b x= 8
15911 a
15911 __main__ start
15911 b x= 0
15911 b x= 1
15911 b x= 2
15911 b x= 3
15911 b x= 4
15911 b x= 5
15911 b x= 6
15911 b x= 7
15911 b x= 8
15911 b x= 9
started 10 children doing 1 iterations each
ok
15911 __main__ end

Notice how the same PID (main parent process) prints "a", "main start" and goes in the loop multiple times...?!

It is as if fork() makes the parent process start the script over instead of continuing at the fork() call location.

This is on Linux with Python 3.6.10.

Does this answer your question? [printf anomaly after "fork()"](https://stackoverflow.com/questions/2530663/printf-anomaly-after-fork) — Joseph Sible-Reinstate Monica, Jul 13 '20 at 16:52

score 0 · Answer 1 · answered Jul 13 '20 at 16:23

This is an obvious issue with the xfstests tool, and has nothing to do with Python or fork calls themselves, as you took care to document

It could help, for example, if xfstests embeds its own python interpreter, calling fork will fork the test suite itself, and it might decide to redo the current-running test from start. (As opposed from calling for from a standalone script that forks only the state of the Python interpreter. So, even if unexpected, this could hardly be called a faulty behavior.

The bad news is that you won't get an easy workaround if yur original intent is to parallelise test executions - or make a test which execute concurrent code to trigger situations you want to test. Higher level calls, such as subprocess or multiprocessing, and even executor pools with concurrent.futures all will call fork in the lower level layers.

You should check if they work, or lead to this behavior. If they work, it means they take care of the corner-case you had hist with your use case and should be preferred in place of a bare fork.

If they don't, as I suspect, the workaround for the case of having code intentionally running in parallel for your tests will include a worker running outside the control of xfstests - something using Celery would do, if you set it up correctly:

the tasks that run in parallel should have no other sideeffects than your desired one, and not use any functionality of whatever xfstests make available for your tests (special libraries, loggers, etc...) - not even an import statement including these.
You run the workers using celery, in parallel to your test suite
your main thread, running inside the test suite coordinats the calls and waits for the workers to complete - optionally making intermediate asserts on the external resource (e.g. filesystem) that you want to measure at carefully timed moments.

If you just want to run tests in parallel, just check the xfstests docs themselves on how to do it, from the "outside" of a test-suite written in Python.

xfstests doesn't embed an interpreter, it just runs a shell script in which i add a call to the script like so `python3 script.py`. — knarf, Jul 13 '20 at 16:25
well, it does some sub-process management on that shell, or that behavior would not take place. — jsbueno, Jul 13 '20 at 16:41

knarf · Accepted Answer · 2020-07-13T16:47:56.843

I've done a strace of the python process and found out that the children are doing the write:

18239 write(1</tmp/17820.out>, "18238 a\n18238 __main__ start\n182"..., 42) = 42

Note who PID don't match. 1839 prints "18238...".

I think what happens is the print() are buffered, and the flushing happens in the children. But each children gets a copy of the parent buffer and flushes it, printing the same buffer multiple times!

I've solved it by flushing stdout before the fork() like so:

        print(os.getpid(), "b", "x=", x)
        sys.stdout.flush()
        pid = os.fork()

I get the same expected output everywhere now.

I can reproduce the issue by simply running the script in the question and redirecting the output to a file:

$ ./script.py > x

x will have the duplicated output.

Related bug report: https://bugs.python.org/issue17230 (sadly marked as WONTFIX)

Why does forking makes the parent process run the whole script again instead of continuing at fork() location?

2 Answers2