1

I want to do CICD of my Databricks Notebook. Steps I followed.

  1. I have integrated my Databricks with Azure Repos.
  2. Created a Build Artifact using YAML script which will hold my Notebook.
  3. Deployed Build Artifact into Databricks workspace in YAML.

Now I want to

  1. Execute and Schedule the Databricks notebook from the Azure DevOps pipeline itself.
  2. How can setup multiple Environments like Stage, Dev, and Prod using YAML.
  3. My Notebook itself call other notebooks. can I do this?

How can I solve this?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
venkat
  • 111
  • 1
  • 1
  • 11

1 Answers1

2

It's doable, and with Databricks Repos you really don't need to create build artifact & deploy it - it's better to use Repos API or databricks repos to update another checkout that will be used for tests.

For testing of notebooks I always recommend to use Nutter library from Microsoft that simplifies testing of notebooks by allowing to trigger their execution from the command-line.

You can include other notebooks using %run directive - it's important to use relative paths instead of absolute paths. You can organize dev/staging/prod either as folders inside the Repos, or as a fully separated environments - it's up to you.

I have a demo of notebooks testing & Repos integration with CI/CD - it contains all necessary instructions how to setup dev/staging/prod + Azure DevOps pipeline that will test notebook & trigger release pipeline.

The only one thing that I want to mention explicitly - for Azure DevOps you will need to use Azure DevOps personal access token because identity passthrough doesn't work with APIs yet.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132