4

Spot instances brings the posibility to use free resources in the cloud paying a lower price, however if the cloud demand is increased your resources will be dealocated. This is very usefull for non critical workloads whenever you can aford to loose some of the work done. More info 2 3

Databricks has the posibility to run spot instances on AWS but there is no documentation about how to do it on Azure.

Is it possible to run Databricks clusters on Azure Spot instances?

Daniel Argüelles
  • 2,229
  • 1
  • 33
  • 56

1 Answers1

4

Yes, it is possible but not using Databricks UI. To use Azure spot instances on Databricks you need to use databricks cli.

Note

With the cli tool is it possible to administrate -create, edit, delete- clusters and instances-pools. However, to simplify the process, I'll focus on editing an existing cluster.

You can install databricks cli using pip install databricks-cli and configure your credentials with databricks configure --token. For more information, visit databricks documentation.

Run the command datbricks clusters list to know the ID of the cluster you want to modify:

$ datbricks clusters list
0422-112415-fifes919  Big Spark3     TERMINATED
0612-341234-jails230  Normal Spark3  TERMINATED
0212-623261-mopes727  Small 7.6      TERMINATED

In my case, I have 3 clusters. First column is the cluster ID, second one is the name of the cluster. Last column is the state.

The command databricks cluster get generates the cluster config in json format. Let's generate the json file to modify it:

databricks clusters get --cluster-id 0422-112415-fifes919 > /tmp/my_cluster.json

This file contains all the configuration related to the cluster like name, instance type, owner... In our case we are looking for the azure_attributes section. You will see something similar to:

...
"azure_attributes": {
    "first_on_demand": 1,
    "availability": "ON_DEMAND_AZURE",
    "spot_bid_max_price": -1.0
  },
... 

We need to change the availability to SPOT_WITH_FALLBACK_AZURE and spot_bid_max_price with our bid price. Edit the file with your favorite tool. The result should be something like:

...
  "azure_attributes": {
    "first_on_demand": 1,
    "availability": "SPOT_WITH_FALLBACK_AZURE",
    "spot_bid_max_price": 0.4566
  },
... 

Once modified, just update the cluster with the new configuration file using databricks clusters edit:

databricks clusters edit --json-file /tmp/my_cluster.json

Now, everytime you start the cluster, the workers will be spot instances.To confirm this, you can go to the configuration tab inside the worker VM that is allocated in the resource group managed by databricks. You will see the Azure spot is active and with the price configured.

enter image description here

Databricks on AWS has more configuration options like SPOT for the availability field. However, until the documentation is released we'll need to wait or configure with try-error approach.

Daniel Argüelles
  • 2,229
  • 1
  • 33
  • 56