2

I am trying to mount my ADLS gen2 storage containers into DBFS, with Azure Active Directory passthrough, using the Databricks Terraform provider. I'm following the instructions here and here, but I'm getting the following error when Terraform attempts to deploy the mount resource:

Error: Could not find ADLS Gen2 Token

My Terraform code looks like the below (it's very similar to the example in the provider documentation) and I am deploying with an Azure Service Principal, which creates the Databricks workspace in the same module:

provider "databricks" {
  host                        = azurerm_databricks_workspace.this.workspace_url
  azure_workspace_resource_id = azurerm_databricks_workspace.this.id
}

data "databricks_node_type" "smallest" {
  local_disk = true

  depends_on = [azurerm_databricks_workspace.this]
}

data "databricks_spark_version" "latest" {
  depends_on = [azurerm_databricks_workspace.this]
}

resource "databricks_cluster" "passthrough" {
  cluster_name            = "terraform-mount"
  spark_version           = data.databricks_spark_version.latest.id
  node_type_id            = data.databricks_node_type.smallest.id
  autotermination_minutes = 10
  num_workers             = 1

  spark_conf = {
    "spark.databricks.cluster.profile"                = "serverless",
    "spark.databricks.repl.allowedLanguages"          = "python,sql",
    "spark.databricks.passthrough.enabled"            = "true",
    "spark.databricks.pyspark.enableProcessIsolation" = "true"
  }

  custom_tags = {
    "ResourceClass" = "Serverless"
  }
}

resource "databricks_mount" "mount" {
  for_each = toset(var.storage_containers)

  name       = each.value
  cluster_id = databricks_cluster.passthrough.id
  uri        = "abfss://${each.value}@${var.sa_name}.dfs.core.windows.net"

  extra_configs = {
    "fs.azure.account.auth.type"                   = "CustomAccessToken",
    "fs.azure.account.custom.token.provider.class" = "{{sparkconf/spark.databricks.passthrough.adls.gen2.tokenProviderClassName}}",
  }

  depends_on = [
    azurerm_storage_container.data
  ]
}

(For clarity's sake, azurerm_storage_container.data is a set of storage containers with names from var.storage_containers, which are created in the azurerm_storage_account with name var.sa_name; hence the URI.)

I feel like this error is due to a fundamental misunderstanding on my part, rather than a simple omission. My underlying assumption is that I can mount storage containers for the workspace, with AAD passthrough, as a convenience when I deploy the infrastructure in its entirety. That is, whenever users come to use the workspace, any new passthrough cluster will be able to use these mounts with zero setup.

I can mount storage containers manually, following the AAD passthrough instructions: Spin up a high-concurrency cluster with passthrough enabled, then mount with dbutils.fs.mount. This is while logged in to the Databricks workspace with my user identity (rather than the Service Principal). Is this the root of the problem; is a Service Principal not appropriate for this task?

(Interestingly, the Databricks runtime gives me exactly the same error if I try to access files on the manually created mount using a cluster without passthrough enabled.)

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Xophmeister
  • 8,884
  • 4
  • 44
  • 87
  • n.b., If I do everything in the above Terraform except create the mount point (i.e., up to and including creating the cluster), then mount the ADLS containers manually in that cluster, then it works. It's just the Terraform mounting that fails, for some reason... – Xophmeister Feb 09 '22 at 17:51
  • 1
    it's better to open issue at https://github.com/databrickslabs/terraform-provider-databricks if you have a problem than to ask on SO that isn't actively monitored. What version of the provider is used? – Alex Ott Feb 17 '22 at 12:12

1 Answers1

0

Yes, that's problem arise from the use of service principal for that operation. Azure docs for credentials passthrough says:

You cannot use a cluster configured with ADLS credentials, for example, service principal credentials, with credential passthrough.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • So it’s not possible to provision a Databricks workspace with mounted ADLS containers, end-to-end, with an SP; you *have* to get a bona fide AAD user involved somewhere? (I feel there may be a hack whereby one could create the mount point manually, as it’s just a file/collection of files in the container, but that seems fragile to say the least!) – Xophmeister Feb 17 '22 at 17:44
  • It's possible to create mounts, but not passthrough mounts – Alex Ott Feb 17 '22 at 17:49
  • I have a [follow-up question](https://stackoverflow.com/questions/71414233/create-azure-key-vault-backed-secret-scope-in-databricks-with-aad-token), after looking into this again. I'm not sure what I'm trying to do is possible... – Xophmeister Mar 09 '22 at 18:26