0

I am sure a lot of people have ask this question already, i am looking for a very simple Azure Databricks CI/CD using Azure Devops. I have 3 notebooks. All I need is after I commit, I only want the notebook that got updated to deploy instead of the whole workspace. Is it possible?

I tried Google for a month but all the blog or guid will redeploy the whole workspace again.

Please help!

  • The pipeline doesn't really have an easy way to know what was changed between the last deployment and this one. As such, it just deploys everything it finds. You will have to build that logic yourself, or break up the project so each notebook is in its own pipeline. You could sum it an issue on the data bricks for Azure pipelines repo. – jessehouwing Dec 05 '20 at 09:06
  • Thank you @jessehouwing, do you know any example of the logic I can refer to? – burberry398 Dec 06 '20 at 05:22
  • I don't have a ready script or anything. There are a lot of corner cases to take into account that make this hard. A naive version of a script to find out which files have changed can be found here: https://stackoverflow.com/a/61133851/736079. You then need to use this information to deploy only changed files. – jessehouwing Dec 06 '20 at 09:02

1 Answers1

0

I am afraid that there is no out-of-the-box feature to achieve this. Here agree with jessehouwing, you need to write a script to find out which files have changed, if your notebook integrates git version control, you can use use git commands git diff-tree --no-commit-id --name-only -r commitId to get changed files. Then according to the logic provided in this case, we can assign the acquired changed file's name to a variable ##vso[task.setvariable variable=VariableName]value. Then use this variable in the source file path to deploy only changed files.

enter image description here

Mr Qian
  • 21,064
  • 1
  • 31
  • 41