Here is a little snippet that I came up with
def jobs_manager():
from IPython.lib.backgroundjobs import BackgroundJobManager
from IPython.core.magic import register_line_magic
from IPython import get_ipython
jobs = BackgroundJobManager()
@register_line_magic
def job(line):
ip = get_ipython()
jobs.new(line, ip.user_global_ns)
return jobs
It uses IPython builtin module IPython.lib.backgroundjobs
. So code is small and simple and no new dependencies are introduced.
I use it like this:
jobs = jobs_manager()
%job [fetch_url(_) for _ in urls] # saves html file to disk
Starting job # 0 in a separate thread.
Then you can monitor the state with:
jobs.status()
Running jobs:
1 : [fetch_url(_) for _ in urls]
Dead jobs:
0 : [fetch_url(_) for _ in urls]
If job fails you can inspect stack trace with
jobs.traceback(0)
There is no way to kill a job. So I carefully use this dirty hack:
def kill_thread(thread):
import ctypes
id = thread.ident
code = ctypes.pythonapi.PyThreadState_SetAsyncExc(
ctypes.c_long(id),
ctypes.py_object(SystemError)
)
if code == 0:
raise ValueError('invalid thread id')
elif code != 1:
ctypes.pythonapi.PyThreadState_SetAsyncExc(
ctypes.c_long(id),
ctypes.c_long(0)
)
raise SystemError('PyThreadState_SetAsyncExc failed')
It raises SystemError
in a given thread. So to kill a job I do
kill_thread(jobs.all[1])
To kill all running jobs I do
for thread in jobs.running:
kill_thread(thread)
I like to use %job
with widget-based progress bar https://github.com/alexanderkuk/log-progress like this:
%job [fetch_url(_) for _ in log_progress(urls, every=1)]
http://g.recordit.co/iZJsJm8BOL.gif
One can even use %job
instead of multiprocessing.TreadPool
:
for chunk in get_chunks(urls, 3):
%job [fetch_url(_) for _ in log_progress(chunk, every=1)]
http://g.recordit.co/oTVCwugZYk.gif
Some obvious problems with this code:
You can not use arbitrary code in %job
. There can be no assignments and not prints for example. So I use it with routines that store results on hard drive
Sometimes dirty hack in kill_thread
does not work. I think that is why IPython.lib.backgroundjobs
does not have this functionality by design. If thread is doing some system call like sleep
or read
exception is ignored.
It uses threads. Python has GIL , so %job
can not be used for some heavy computations that take in python byte code