Unable to mount Azure ADLS Gen 2 on from Community Edition of Databricks : com.databricks.rpc.UnknownRemoteException: Remote exception occurred

Question

I am trying to mount ADLS Gen 2 from my databricks Community Edition, but when I run the following code:

test = spark.read.csv("/mnt/lake/RAW/csds.csv", inferSchema=True, header=True)

I get the error:

com.databricks.rpc.UnknownRemoteException: Remote exception occurred:

I'm using the following code to mount ADLS Gen 2

def check(mntPoint):
  a= []
  for test in dbutils.fs.mounts():
    a.append(test.mountPoint)
  result = a.count(mntPoint)
  return result

mount = "/mnt/lake"

if check(mount)==1:
  resultMsg = "<div>%s is already mounted. </div>" % mount
else:
  dbutils.fs.mount(
  source = "wasbs://root@adlspretbiukadlsdev.blob.core.windows.net",
  mount_point = mount,
  extra_configs = {"fs.azure.account.key.adlspretbiukadlsdev.blob.core.windows.net":""})
  resultMsg = "<div>%s was mounted. </div>" % mount

displayHTML(resultMsg)


ServicePrincipalID = 'xxxxxxxxxxx'
ServicePrincipalKey = 'xxxxxxxxxxxxxx'
DirectoryID =  'xxxxxxxxxxxxxxx'
Lake =  'adlsgen2'


# Combine DirectoryID into full string
Directory = "https://login.microsoftonline.com/{}/oauth2/token".format(DirectoryID)

# Create configurations for our connection
configs = {"fs.azure.account.auth.type": "OAuth",
           "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id": ServicePrincipalID,
           "fs.azure.account.oauth2.client.secret": ServicePrincipalKey,
           "fs.azure.account.oauth2.client.endpoint": Directory}



mount = "/mnt/lake"

if check(mount)==1:
  resultMsg = "<div>%s is already mounted. </div>" % mount
else:
  dbutils.fs.mount(
  source = f"abfss://root@{Lake}.dfs.core.windows.net/",
  mount_point = mount,
  extra_configs = configs)
  resultMsg = "<div>%s was mounted. </div>" % mount

I then try to read a dataframe in ADLS Gen 2 using the following:

dataPath = "/mnt/lake/RAW/DummyEventData/CommerceTools/"

test = spark.read.csv("/mnt/lake/RAW/csds.csv", inferSchema=True, header=True)

com.databricks.rpc.UnknownRemoteException: Remote exception occurred:

Any ideas?

Hi @AlexOtt, do you mean ```/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) 325 if answer[1] == REFERENCE_TYPE: --> 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value)`` — Patterson, May 16 '21 at 09:13
Or did you mean ``` Py4JJavaError Traceback (most recent call last) in ----> 1 test = spark.read.csv("/mnt/lake/RAW/csds.csv", inferSchema=True, header=True) /databricks/spark/python/pyspark/sql/readwriter.py in csv(self, path, schema, sep, encoding, quote, escape, comment, 762 path = [path] 763 if type(path) == list: --> 764 return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path))) ``` — Patterson, May 16 '21 at 09:16
yes, include the error message from JVM - usually there should be a line, "Caused by" — Alex Ott, May 16 '21 at 09:19
I suspect that it could be caused by the security model of the community edition that is different from the "normal" Databricks — Alex Ott, May 16 '21 at 09:20
Hi @AlexOtt, that is what I was thinking. But I wanted to know for sure, before I started troubleshooting — Patterson, May 16 '21 at 09:29
Hi @AlexOtt, I'm not sure how to get you the full stracktrace? SO will only allow a certain number of characters — Patterson, May 17 '21 at 12:23
Put it into https://gist.github.com or something like and link it from post — Alex Ott, May 17 '21 at 12:26
Hi @AlexOtt, I have done this before with github, but here you https://gist.github.com/cpatte7372/f9a820e82c5e57befa919430b1b9af45 Let me know if you can access it? Thanks — Patterson, May 18 '21 at 10:23
@AlexOtt, so I assigned the Service Principle with Storage Blob Data Contributor. Now, I'm able to read in the CSV using: ```test2 = spark.read.csv("abfss://root@adlspretbiukadlsdev.dfs.core.windows.net/RAW/csds.csv",inferSchema=True,header=True)``` But I'm still getting the error when reading in the same CSV with: ```test = spark.read.csv("/mnt/lake/RAW/csds.csv", inferSchema=True, header=True)``` — Patterson, May 18 '21 at 12:12
@AlexOtt, I re-added the code to https://gist.github.com/cpatte7372/f9a820e82c5e57befa919430b1b9af45 again just in case you have to check it out — Patterson, May 18 '21 at 12:19
Hi @AlexOtt, did you get a chance to take another look at the code? — Patterson, May 18 '21 at 20:07
I don’t know exactly, but I suspect something specific to community edition. I suggest just use full abfss url instead of mount - community edition isn’t the same as standard databricks — Alex Ott, May 18 '21 at 20:58

score 2 · Accepted Answer · answered May 18 '21 at 10:28

Based on the stacktrace, most probably reason for that error is that you don't have Storage Blob Data Contributor (or Storage Blob Data Reader) role assigned for your service principal (as it's described in documentation). This role is different from usual "Contributor" role, and that's very confusing.

Unable to mount Azure ADLS Gen 2 on from Community Edition of Databricks : com.databricks.rpc.UnknownRemoteException: Remote exception occurred

1 Answers1