5

I am trying to connect from Databricks to Synapse using service principal. I have configured the service principal in cluster configuration

fs.azure.account.auth.type.<datalake>.dfs.core.windows.net OAuth
fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.id <Service Principal ID/Application ID>
fs.azure.account.oauth2.client.secret <Client secret key/Service Principal Password>
fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<tenant-id>/oauth2/token
fs.azure.createRemoteFileSystemDuringInitialization true

Whilst I can successfully connect to DataLake and work, i could not write to synapse, when I use the below command...

DummyDF.write.format("com.databricks.spark.sqldw")\
.mode("append")\
.option("url", jdbcUrl)\
.option("useAzureMSI", "true")\
.option("tempDir",tempdir)\
.option("dbTable", "DummyTable").save()

I am getting the below error...

Py4JJavaError: An error occurred while calling o831.save.
: com.databricks.spark.sqldw.SqlDWSideException: SQL DW failed to execute the JDBC query produced by the connector.
Underlying SQLException(s):
com.microsoft.sqlserver.jdbc.SQLServerException: External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:
HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD, https://datalakename.dfs.core.windows.net/temp/2020-06-24/14-21-57-819/88228292-9f00-4da0-b778-d3421ea4d2ec?upn=false&timeout=90' [ErrorCode = 105019] [SQLState = S0001]

However i could write to Synapse using the below command...

DummyDF.write.mode("append").jdbc(jdbcUrl,"DummyTable")

I am not sure what is missing.

colinD
  • 1,641
  • 1
  • 20
  • 22
Ram
  • 51
  • 1
  • 3

1 Answers1

0

The second option is not using Polybase, goes just through JDBC and is way slower.

I think your error is not connected to Databricks and SQL DW library, rather connectivity between Synapse and the Storage.

Could you check:

  • Is "Allow access to Azure services" set to ON on the firewall pane of the Azure Synapse server through Azure portal (overall remember if your Azure Blob Storage is restricted to select virtual networks, Azure Synapse requires Managed Service Identity instead of Access Keys)
  • verify if you have you correctly specified tempDir, for blob storage "wasbs://" + blobContainer + "@" + blobStorage +"/tempDirs" or*"abfss://..."* for ADLS Gen 2
  • Can you create external tables to that storage using managed identity directly from Synapse?

Here is one article that covers solving the same error code as yours 105019: https://techcommunity.microsoft.com/t5/azure-synapse-analytics/msg-10519-when-attempting-to-access-external-table-via-polybase/ba-p/690641

Valdas M
  • 113
  • 6