I believe that scheduling a periodic run of an ipython
notebook
from Cloud Datalab could be too much of an anti pattern to be encouraged.
The jupyter "server" is running inside a container on a Compute Engine VM instance.
At first thought, one could hope to achieve this to convert a "notebook" to a regular Python module and then run it remotely, the problem is the third party libraries dependencies that you might have in place.
Even if there are no dependencies, the required software to convert the notebook
isn't installed on the container
's image so you'd need to install it on every run between instance restarts.
You could also convert it "yourself", and while this is not guaranteed to run successfully in every case, as a deeper research on how notebook
s format is (even though a first glance doesn't seem too complicated), I'll demonstrate how to do it below.
So, let's pipe the notebook
source code to the exec
Built-in Function, load our dependencies so our exec
call runs successfully.
All of this remotely, through the datalab container
running on the VM instance.
$ project= #TODO edit
$ zone= #TODO edit
$ instance= #TODO edit
$ gcloud compute ssh $instance --project $project --zone $zone -- docker exec datalab python '-c """
import json
import imp
#TODO find a better way and not escape quote characters?...
bs4 = imp.load_package(\"bs4\", \"/usr/local/envs/py2env/lib/python2.7/site-packages/bs4\")
BeautifulSoup = bs4.BeautifulSoup
notebook=\"/content/datalab/notebooks/notebook0.ipynb\"
source_exclude = (\"from bs4 import BeautifulSoup\")
with open(notebook) as fp:
source = \"\n\".join(line for cell in json.load(fp)[\"cells\"] if cell[\"cell_type\"]==\"code\" for line in cell[\"source\"] if line not in source_exclude)
#print(source)
exec(source)
"""
'
So far, I couldn't couldn't find another way to not escape the characters as my bash
's expertise is not much.
You'll also, at least, get warnings related with some of the imp.load_package
libraries's dependencies not being available. This remind us how this approach is not scalable at all.
I don't know what do you think of this but perhaps is better to have the Python source code you want to run on a Cloud Function and then trigger that function with Cloud Scheduler. Check out this community example for that.
I believe a reasonable take from this post is that a notebook
can have different use cases than a Python module
.
Also, make sure to go through the Cloud Datalab documentation to at least understand some of the concepts this answer relates with.