Azure Databricks can Autoloader use AD credential passthrough?

Question

I am unable to authenticate to ADLS Gen2 when using Autoloader. My Databricks cluster is enabled with my AD credentials. This pass-through allows the following read and write from ADLS Gen2.

filepath_read = "abfss://container@storage_account.dfs.core.windows.net/output/parquet_file.pq"
filepath_write = "abfss://container@storage_account.dfs.core.windows.net/output/data_write/"
df = spark.read.parquet(filepath)
display(df)
df.write.mode("overwrite").parquet(filepath_write)

I assumed this access would allow me to create a stream, if useNotifications was set to false. However when trying to create a streaming query with autoloader it fails. I find this odd, because I assume I only need read and write access since I am not setting up additional services with useNotifications set to false. I initialized the stream with:

cloud_file = {"cloudFiles.format":"parquet",
              "cloudFiles.useNotifications":"false",
              "cloudFiles.subscripionId":subscripionId,
              "cloudFiles.connectionString":connection_string,
              "cloudFiles.tenantId":tenantId,
              "cloudFiles.resourceGroup":resourceGroup
              
              }
df_read_stream =(spark
     .readStream
     .format("cloudFiles")
     .options(**cloud_file)
     .schema(schema)
     .load("abfss://container@storage_account.dfs.core.windows.net/raw_landing"))

Then attempted to write the stream with:


df_read_stream.writeStream \
    .format("delta") \
    .outputMode("append") \
    .foreachBatch(flatten) \
    .trigger(once = True) \
    .option("checkpointLocation",checkpoint_location) \
    .start("abfss://container@storage_account.dfs.core.windows.net/transformed/transformed_delta_table")

Execution returns this error:

com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token

Do I need additional credentials to authenticate even with AD credential pass-through enabled?

score 1 · Answer 1 · answered Jun 15 '22 at 08:12

The above error mainly happens because the ADLS Gen2 is properly not connected to azure Databricks

Please follow below syntax for connecting ADLS Gen2 to ADB and make sure to copy the storage account key.

Connecting ADLS Gen2 to Azure Databricks

spark.conf.set(
"fs.azure.account.key.<storage-account-name>.core.windows.net","<storage-account-Access key>")

Or

If you are trying to mount, follow this code:

dbutils.fs.mount(
source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/",
mount_point = "/mnt/azo1",
extra_configs = {"fs.azure.account.key.<storage-account-name>.blob.core.windows.net":"<storage-account-Access key>"})

For info, refer to this official Microsoft document,

Configuring Auto Loader in Azure Databricks follow this link, it has a detailed explanation about read and write streaming data on the Azure Databricks.

Thank you @BhanunagasaiVamsi-MT. That indeed solved the problem, if I create a new cluster not enabled with credential pass through and add that token. However I am wondering why AD credentials are not being passed through to autoloader? I would think that token would be passed through since I am able to read and write to ADLS Gen2. — Levi Huddleston, Jun 15 '22 at 16:56

Azure Databricks can Autoloader use AD credential passthrough?

1 Answers1