I am unable to authenticate to ADLS Gen2 when using Autoloader. My Databricks cluster is enabled with my AD credentials. This pass-through allows the following read and write from ADLS Gen2.
filepath_read = "abfss://container@storage_account.dfs.core.windows.net/output/parquet_file.pq"
filepath_write = "abfss://container@storage_account.dfs.core.windows.net/output/data_write/"
df = spark.read.parquet(filepath)
display(df)
df.write.mode("overwrite").parquet(filepath_write)
I assumed this access would allow me to create a stream, if useNotifications was set to false. However when trying to create a streaming query with autoloader it fails. I find this odd, because I assume I only need read and write access since I am not setting up additional services with useNotifications set to false. I initialized the stream with:
cloud_file = {"cloudFiles.format":"parquet",
"cloudFiles.useNotifications":"false",
"cloudFiles.subscripionId":subscripionId,
"cloudFiles.connectionString":connection_string,
"cloudFiles.tenantId":tenantId,
"cloudFiles.resourceGroup":resourceGroup
}
df_read_stream =(spark
.readStream
.format("cloudFiles")
.options(**cloud_file)
.schema(schema)
.load("abfss://container@storage_account.dfs.core.windows.net/raw_landing"))
Then attempted to write the stream with:
df_read_stream.writeStream \
.format("delta") \
.outputMode("append") \
.foreachBatch(flatten) \
.trigger(once = True) \
.option("checkpointLocation",checkpoint_location) \
.start("abfss://container@storage_account.dfs.core.windows.net/transformed/transformed_delta_table")
Execution returns this error:
com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token
Do I need additional credentials to authenticate even with AD credential pass-through enabled?