0

I'm trying to run Jupyter notebook file for each inputs in the python list from another notebook I've used Jupyter Notebook's magic command %run to accomplish the task

input_list= [1,  131,  312,  327,  348,  485,  469, 1218, 1329, 11212]
for i in input_list:
    try:
        input = i
        !run ./notebook.ipynb 
    except:
        pass

Code is working but the execution time is very high So I decided to use Multiprocessing Libraries with the code to execute the code faster

function using inside multiprocessing

def function(i):
    try:
        input = i
        print(input)#print the current element passed
        %run ./notebook.ipynb
    except:
        pass

multiproccessing code

    from multiprocessing import Pool, cpu_count
    from tqdm import tqdm

    p = Pool(8)

    tqdm(p.imap(function, input_list))

    p.close()
    p.join()

But problem here is the argument that is passed to Function is not passed to notebook used in %run magic command

I got a error like "input is not defined"

What would be a possible solution for this problem?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • Crossposted [here](https://discourse.jupyter.org/t/using-magic-commands-with-python-multiprocessing-libraries/14514?u=fomightez). – Wayne Jun 08 '22 at 18:43

1 Answers1

0

It works when you follow the guide here to how to use arguments.
Illustrating with a minimal working example.

Make a notebook called add3.ipynb with the following contents as the only cell in it:

o = i + 3
print (f"where the input is {i}; the  output is {o}\n")

Then for your notebook to control the running with various values like you want, use in a code cell the following:

# based on https://pymotw.com/3/multiprocessing/basics.html
import multiprocessing

def worker(i):
    try:
        print (f"input is {i}\n")#print the current element passed
        %run ./add3.ipynb
    except:
        pass
    

input_list= [1,  131,  312,  327,  348,  485,  469, 1218, 1329, 11212]


if __name__ == '__main__':
    jobs = []
    for i in input_list:
        p = multiprocessing.Process(target=worker, args=(i,))
        jobs.append(p)
        p.start()

I'll paste a typical run of that at the bottom of this post.


I still suggest you use papermill to do this so you can parameterize the notebook and then save the files with the new versions, as if a report.

Alternatively, you can use other means to inject code or construct the notebook to run with the input value. A lot of the times I use a template in string from inside a script with a placeholder for the value. Then I run the script to generate the notebooks with the value in them using string.replace() method, save the resulting strings as notebook files, and then run those notebooks using jupytext or jupyter nbconvert. nbformat can be useful for building such a notebook file too. That way you can generate reports in notebook form with the results from each run.

Also, if you don't need the code your calling to be in a notebook, it is often more convenient to save it as a python script (ending in .py) or an ipython script (ending in .ipy). (The latter allows you to use IPython magics in a script and is often an easier way to develop when you are used to Jupyter. However, the resulting script runs much slower then pure Python and so I usually end up converting to pure Python and only use the .ipy form early in development.) For example, the contents of the one cell in my example add3.ipynb could simply have been a script add3.py saved. And then from in a notebook I can run it like the following (leaving out multiprocessing for sake of simplicity):

input_list= [1,  131,  312,  327,  348,  485,  469, 1218, 1329, 11212]
for i in input_list:
    %run -i add3.py

Note the use of the -i option with %run to "run the file in IPython’s namespace instead of an empty one." Note that option isn't necessary when using %run to run another notebook, because as by default, it's as if you are running the other notebook in the calling the notebook. I like the greater flexibility using %run in conjunction with a script because often I don't want the script running in the same namespace. The alternatives I mentioned (papermill, jupytext, &jupyter nbconvert) to execute an external notebook separate from the current namepsace.


Result seen when running the minimal working example:

input is 1

input is 131

input is 312
input is 327


input is 348
input is 485


input is 469
input is 1218


input is 11212
input is 1329

where the input is 131; the  output is 134


where the input is 1; the  output is 4
where the input is 312; the  output is 315
where the input is 327; the  output is 330


where the input is 485; the  output is 488


where the input is 1218; the  output is 1221
where the input is 469; the  output is 472


where the input is 348; the  output is 351

where the input is 1329; the  output is 1332

where the input is 11212; the  output is 11215
Wayne
  • 6,607
  • 8
  • 36
  • 93
  • Is there way to limit the no of cores to be used in the first code snippet. First code snippet is working fine but it's utilizing all the cores, is it possible to tune that code to work with limited no of cores? – Nitheeswaran B Jun 13 '22 at 09:13
  • Did you see [here](https://stackoverflow.com/a/56296239/8508004) or [here](https://stackoverflow.com/a/33480849/8508004) and [here](https://stackoverflow.com/a/20886753/8508004)? – Wayne Jun 13 '22 at 16:42
  • In two references mentioned above recommended answer is to use multiprocessing.Pool, but if I use **Pool.map** or **Pool.imap** (Used the same code mentioned in question itself) the values from the input_list is not passing to the another/child notebook(add3.ipynb). I've executed the code as per third reference but that code also unable to pass values to the child notebook. Is there any other way that can be implemented for this task – Nitheeswaran B Jun 14 '22 at 05:10
  • Oh yes, that puts you back with original problem. Well I did mean to say with the original code that was meant as a framework for using mutliprocessing in the notebook. You'd have to modify it to control it rather than just run the loop directly. Did you try controlling how many are run using the loop and do it in smaller batches? For example, three at a time? For example you can change `for i in input_list:` to `for indx,i in enumerate(input_list):`, and then at the end of the loops in that command you can put a conditional `if i%3 ==0:` controlling a pause using `time.sleep()`. – Wayne Jun 14 '22 at 12:09
  • (Again, what I wrote was just an idea. Maybe three is too little and I didn't try the code yet.) Make the pause of the length it takes to do the job of one (or maybe slightly longer) so you only feed like three jobs at the time. Or another tack...Have you tried looking at multithreading? Maybe that allows you to better control the number of cores in combination with allowing what you want to work from within a notebook. – Wayne Jun 14 '22 at 12:10