3

I've been tasked with automating the scheduling of some notebooks that are run daily that are on AI Platform notebooks via the Papermill operator, but actually doing this through Cloud Composer is giving me some troubles.

Any help is appreciated!

Kuba Wernerowski
  • 337
  • 1
  • 6
  • 17

2 Answers2

0

First step is to create Jupyter Lab Notebook. If you want to use additional libraries, install them and restart the kernel (Restart Kernel and Clear All Outputs option). Then, define the processing inside your Notebook.

When it's ready, remove all the runs, peeks and dry runs before you start the scheduling phase.

Now, you need to set up Cloud Composer environment (remember about installing additional packages, that you defined in first step). To schedule workflow, go to Jupyter Lab and create second notebook which generates DAG from workflow.

The final step is to upload the zipped workflow to the Cloud Composer DAGs folder. You can manage your workflow using Airflow UI.

I recommend you to take a look for this article.

Another solution that you can use is Kubeflow, which aims to make running ML workloads on Kubernetes. Kubeflow adds some resources to your cluster to assist with a variety of tasks, including training and serving models and running Jupyter Notebooks. You can find interesting tutorial on codelabs.

I hope you find the above pieces of information useful.

aga
  • 3,790
  • 3
  • 11
  • 18
0

This blog post on Medium, "How to Deploy and Schedule Jupyter Notebook on Google Cloud Platform", describes how to run Jupyter notebook jobs on a Compute Engine Instance and schedule it using GCP's Cloud Scheduler > Cloud Pub/Sub > Cloud Functions. (Unfortunately the post may be paywalled.)

If you must use Cloud Composer, then you might find this answer to related question, "ETL in Airflow aided by Jupyter Notebooks and Papermill," useful.

hwu76
  • 150
  • 1
  • 6