0

I have following code

#!/bin/env python
# http://stackoverflow.com/questions/32192938/order-of-subprocesses-execution-and-its-impact-on-operations-atomicity

from multiprocessing import Process
from multiprocessing import Queue
import time
import os

# Define an output queue
output = Queue()

# define a example function
def f(x, output):

    time.sleep(.5)
    ppid = os.getppid()   # PPID
    pid  = os.getpid()     # PID
    # very computing intensive operation
    result = 10*x
    print "(%s, %s, %s)" % (pp, p, result)
    time.sleep(.5)
    # store result as tuple
    result = (ppid, pid, result)
    output.put(result)
    # return result


def queue_size(queue):
    size = int(queue.qsize())
    print size

# Print parent pid
print "Parent pid: %s" % os.getpid()

# Setup a list of processes that we want to run
processes = [Process(target=f, args=(x, output)) for x in range(1,11)]

# Run processes
for p in processes:
    p.start()

# Process has no close attribute
# for p in processes:
#     p.close()

# Exit the completed processes
for p in processes:
    p.join()


# Get process results from the output queue
print "Order of result might be different from order of print"
print "See: http://stackoverflow.com/questions/32192938/order-of-subprocesses-execution-and-its-impact-on-operations-atomicity"
print ""
results = [output.get() for p in processes]
print(results)

where I want to replace print "(%s, %s, %s)" % (pp, p, result) with multiple statements like this:

print "ppid: %s" % ppid
print "pid:  %s" % pid
print "result: %s" % result
print "#####################"

for this purpose I've chosen semaphores to ensure that this output will be atomic. This is modified version:

#!/bin/env python
# http://stackoverflow.com/questions/32192938/order-of-subprocesses-execution-and-its-impact-on-operations-atomicity

from multiprocessing import Process
from multiprocessing import Queue
import threading
import time
import os

max_threads = 1
semaphore = threading.BoundedSemaphore(max_threads)

# Define an output queue
output = Queue()

# define a example function
def f(x, output):

    time.sleep(.5)
    ppid = os.getppid()   # PPID
    pid  = os.getpid()     # PID
    # very computing intensive operation
    result = 10*x

    # print "(%s, %s, %s)" % (pp, p, result)
    semaphore.acquire()
    print "ppid: %s" % ppid
    print "pid:  %s" % pid
    print "result: %s" % result
    print "#####################"
    semaphore.release()

    time.sleep(.5)
    # store result as tuple
    result = (ppid, pid, result)
    output.put(result)
    # return result


def queue_size(queue):
    size = int(queue.qsize())
    print size

# Print parent pid
print "Parent pid: %s" % os.getpid()

# Setup a list of processes that we want to run
processes = [Process(target=f, args=(x, output)) for x in range(1,11)]

# Run processes
for p in processes:
    p.start()

# Process has no close attribute
# for p in processes:
#     p.close()

# Exit the completed processes
for p in processes:
    p.join()


# Get process results from the output queue
print "Order of result might be different from order of print"
print "See: http://stackoverflow.com/questions/32192938/order-of-subprocesses-execution-and-its-impact-on-operations-atomicity"
print ""
results = [output.get() for p in processes]
print(results)

But it seems that those operations are not atomic (PID 10269 and PID 10270), and semaphore did not help, here is output:

Parent pid: 10260
ppid: 10260
pid:  10264
result: 40
#####################
ppid: 10260
pid:  10263
result: 30
#####################
ppid: 10260
pid:  10265
result: 50
#####################
ppid: 10260
pid:  10262
result: 20
#####################
ppid: 10260
pid:  10267
result: 70
#####################
ppid: 10260
pid:  10268
result: 80
#####################
ppid: 10260
pid:  10261
result: 10
#####################
ppid: 10260
ppid: 10260
pid:  10269
pid:  10270
result: 90
result: 100
#####################
#####################
ppid: 10260
pid:  10266
result: 60
#####################
Order of result might be different from order of print
See: http://stackoverflow.com/questions/32192938/order-of-subprocesses-execution-and-its-impact-on-operations-atomicity

[(10260, 10264, 40), (10260, 10263, 30), (10260, 10265, 50), (10260, 10267, 70), (10260, 10262, 20), (10260, 10268, 80), (10260, 10261, 10), (10260, 10270, 100), (10260, 10269, 90), (10260, 10266, 60)]

Why?

Wakan Tanka
  • 7,542
  • 16
  • 69
  • 122

1 Answers1

2

You are using processes to run f, but you are trying to use threading semaphores for synchronization. You are mixing incompatible multi-tasking models here. Processes, as you are using in your program, runs in different memory space and have independent program counter, meaning you cannot synchronize them like they are running in a single program. Threads runs them in single program, shared memory.

I mean, every process in processes will run as an independent program. You can try using multiprocessing.Lock, but I think it does not make sense to lock independent programs only to print a debugging output.

Instead, I recommend you change your print statement:

print("ppid: {}\n"
      "pid:  {}\n"
      "result: \n"
      "#####################".format(ppid, pid, result))

Note you can put the separated strings and the python interpreter can join them automatically. Also introducing \n inserts line breaks. I also changed to print() function and format(), use of % is deprecated.

With this approach you have less probability of mixing output, but it may still happen. If not good enough, use multiprocessing.Lock instead of threading.Lock, no further code changes needed.

Camilo Torres
  • 478
  • 2
  • 8
  • thank you for reply. "I think it does not make sense to lock independent programs only to print a debugging output." I agree, this was just an example and as a beginner I wanted to know if the chosen approach is correct also for more complicated critical section. Using the locks (or maybe semaphores?) is probably the best way. One thing that the user should be aware of is using lock as global variable, further info here http://stackoverflow.com/questions/28267972/python-multiprocessing-locks – Wakan Tanka Sep 01 '15 at 21:00