Multiprocessing in Python on large data set - Win10 vs Ubuntu

Question

from PIL import Image
from multiprocessing import Pool, Process
import time

img = Image.open('sunflower.jpg')
data = [x for x in img.getdata()]
new_image = Image.new("RGB", img.size)
new_image_data = []


def convert_pixels(pixel):
    r, g, b = pixel
    gray = int((r + g + b) / 3)
    gray = (gray, gray, gray)
    return gray


def convert_pixels_mp(pixels):
    start_time = time.time()
    p = Pool(6)
    result = p.map(convert_pixels, pixels)
    new_image_data.append(result)
    p.close()
    p.join()
    end_time = time.time() - start_time
    print(f"Processing took {end_time[0:4]}s")


def convert_pixels_sp(pixels):
    print("======Process started======")
    start_time = time.time()
    for i in pixels:
        result = convert_pixels(i)
        new_image_data.append(result)
    end_time = str(time.time() - start_time)
    print(f"Processing took {end_time[0:5]}s")


def save_image():
    new_image.putdata(new_image_data)
    new_image.save("gray.jpg")


if __name__ == "__main__":
    
    print("\n**Single core process**\n")
    p3 = Process(target=convert_pixels_sp, args=(data,))
    p3.start()
    p3.join()

    print("\nMultiprocessing with usage of 2 cores\n")
    # data sets
    l1 = data[0: int(len(data) / 2)]
    l2 = data[int(len(data) / 2):]
    
    # processing same data with usage of 2 cores
    p1 = Process(target=convert_pixels_sp, args=(l1,))
    p1.start()
    p2 = Process(target=convert_pixels_sp, args=(l2,))
    p2.start()
    p1.join()
    p2.join()

    # save new image
    save_image()

I have simple code to process large data set (in this example it is list of single pixels values from image in form of tuple). For my school purpose I have to prove that multiprocessing is more efficient than single process. I noticed weird behaviour there when i've been struggling for quite while with my code, same code executed on Ubuntu and Win10 gives different results, on Windows its like its not really using multi cores for that, second process starts only after first one is done. And when I'm executing same code on Ubuntu both processes starting at the same time. So my question is why is it like that? I really want to understand that.

Ubuntu console output:

Windows console output:

Linux uses fork but windows create sub-process for each. check this out. https://stackoverflow.com/questions/42148344/python-multiprocessing-linux-windows-difference — Amin S, Jan 22 '23 at 00:09
python buffers its output ... in other words they may have started at the same time, but they only show their prints when they are done, so what you see in the output doesn't really indicate the order of events (which you can bypass by passing `flush=True` in `print`), aside from the fact that the above code doesn't do what you think it does, (print new_image_data to find it empty.) — Ahmed AEK, Jan 22 '23 at 05:22
Multiprocessing in python has a lot of details to learn, but I would suffice to say if you are trying to show a large difference of single core vs multi-core, your task must take quite a while (often hard to justify the overhead of mp if the function only takes a few ms). You also must be wary that on Windows; the size of the arguments to the function can add a lot of overhead, so sending lots of data by passing it as arguments can slow the multiprocessing case vs the single core. — Aaron, Jan 23 '23 at 15:45

Multiprocessing in Python on large data set - Win10 vs Ubuntu

Ubuntu console output:

Windows console output:

0 Answers0