1

I am referring to this answer in order to handle multiple files at once using multiprocessing but it stalls and doesn't work

That is my try:

import multiprocessing
import glob
import json

def handle_json(file):
    with open(file, 'r', encoding = 'utf-8') as inp, open(file.replace('.json','.txt'), 'a', encoding = 'utf-8', newline = '') as out:
        length = json.load(inp).get('len','') #Note: each json file is not large and well formed
        out.write(f'{file}\t{length}\n')

p = multiprocessing.Pool(4)
for f, file in enumerate(glob.glob("Folder\\*.json")):
    p.apply_async(handle_json, file)
    print(f)

p.close()
p.join() # Wait for all child processes to close.

Where is the problem exactly, I thought it may be because I have 3000 json files so I copied just 50 into another folder and tried with them but also the same problem

ADDED: Debug with VS Code

Exception has occurred: RuntimeError       (note: full exception trace is shown but execution is paused at: <module>)

        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
  File "C:\Users\admin\Desktop\F_New\stacko.py", line 10, in <module>
    p = multiprocessing.Pool(4)
  File "<string>", line 1, in <module> (Current frame)

Another ADD Here a zip file contains the sample file with the code https://drive.google.com/file/d/1fulHddGI5Ji5DC1Xe6Lq0wUeMk7-_J5f/view?usp=share_link

Task Manager

Khaled
  • 159
  • 3
  • 10
  • What tools do you use for diagnosis? What does "stall" mean for you? What does [Process Monitor](https://learn.microsoft.com/en-us/sysinternals/downloads/procmon) report for the activity of your program (filter for Path contains python3.exe)? – Thomas Weller Dec 16 '22 at 14:48
  • Does this answer your question? [Python Multiproccess with I/O](https://stackoverflow.com/questions/45093876/python-multiproccess-with-i-o) – DRTorresRuiz Dec 16 '22 at 15:00
  • I am using python idle shell on windows 10 64 bit and python version 3.9.7 – Khaled Dec 16 '22 at 15:29
  • I mean with "stall", mouse pointer is converted to a ring and the process is not ended – Khaled Dec 16 '22 at 15:30
  • For diagnosis you should really provide more information. How long did you wait? How much progress was reported (you print progress), etc. – Thomas Weller Dec 16 '22 at 15:40
  • I added a pic for task manager, I waited at least 5 Min. the json files are only 50 each one has almost 2 KB size – Khaled Dec 16 '22 at 15:53
  • OT: maybe you want to upgrade from IDLE to [PyCharm Community Edition](https://www.jetbrains.com/products/compare/?product=pycharm&product=pycharm-ce) it's free and much better. Honestly. Debugging is much better – Thomas Weller Dec 16 '22 at 15:55
  • What is "is not large"? That's subjective. What is the largest file size? What is the average file size. Programmings is computer *science*. We rely on facts – Thomas Weller Dec 16 '22 at 15:57
  • There are 50 json files in the directory named "Folder", each file is only 2 KB – Khaled Dec 16 '22 at 16:00
  • @ThomasWeller I will try with VS Code instead PyCharm if it helps, I added to my question a snippet of debug errors with VS Code – Khaled Dec 16 '22 at 16:04
  • PyCharm probably won't help in this particular issue, but in general – Thomas Weller Dec 16 '22 at 16:09
  • Here are my files with the code https://drive.google.com/file/d/1fulHddGI5Ji5DC1Xe6Lq0wUeMk7-_J5f/view?usp=share_link – Khaled Dec 16 '22 at 16:26

2 Answers2

1

The apply_async function in multiprocessing expects the arguments to the called function to be iterable, so you need to do e.g.:

p.apply_async(handle_json, [file])
match
  • 10,388
  • 3
  • 23
  • 41
  • I fixed it to iterable and run it, it outputs the index of each file but no file is written as expected and when it reaches the last index f, it stalls and if i want to close it, it asks for killing the process, I am using python idle shell on windows 10 – Khaled Dec 16 '22 at 15:25
1

on windows you have to put your multiprocessing code guarded by an if __name__ == "__main__":, Compulsory usage of if name=="main" in windows while using multiprocessing [duplicate]

you also need to use get on the tasks that you launched with apply_async, in order to wait for them to finish, so you should store them in a list and iterate the get on them.

after fixing, your code would look as follows:

import multiprocessing
import glob
import json

def handle_json(file):
    with open(file, 'r', encoding = 'utf-8') as inp, open(file.replace('.json','.txt'), 'a', encoding = 'utf-8', newline = '') as out:
        length = json.load(inp).get('len','') #Note: each json file is not large and well formed
        out.write(f'{file}\t{length}\n')

if __name__ == "__main__":
    p = multiprocessing.Pool(4)
    tasks = []
    for f, file in enumerate(glob.glob("Folder\\*.json")):
        task = p.apply_async(handle_json, [file])
        tasks.append(task)
        print(f)

    for task in tasks:
        task.get()
    p.close()
    p.join() # Wait for all child processes to close.
Ahmed AEK
  • 8,584
  • 2
  • 7
  • 23
  • Yeah, it works! thank you but there is a typo originated from my above code: task = p.apply_async(handle_json, [file]) must be instead task = p.apply_async(handle_json, file) so file should be given as iterable. – Khaled Dec 16 '22 at 19:01