write/save Dataframe to azure file share from azure databricks

Question

How to write to azure file share from azure databricks spark jobs.

I configured the Hadoop storage key and values.

spark.sparkContext.hadoopConfiguration.set(
  "fs.azure.account.key.STORAGEKEY.file.core.windows.net",
  "SECRETVALUE"
)


val wasbFileShare =
    s"wasbs://testfileshare@STORAGEKEY.file.core.windows.net/testPath"

df.coalesce(1).write.mode("overwrite").csv(wasbBlob)

When tried to save the dataframe to azure file share I'm seeing the following the resource not found error although the URI is present.

 Exception in thread "main" org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: The requested URI does not represent any resource on the server.

In Azure storage, `wasbs:` just support azure blob : https://datacadamia.com/azure/wasb. If you want to use Azure file share, it seems that you need to use sdk. — Jim Xu, Sep 24 '20 at 08:33

score 1 · Answer 1 · answered Feb 26 '21 at 17:01

Steps to connect to azure file share from databricks

first install Microsoft Azure Storage File Share client library for Python using pip install in Databricks. https://pypi.org/project/azure-storage-file-share/

after installing, create a storage account. Then you can create a fileshare from databricks

from azure.storage.fileshare import ShareClient

share = ShareClient.from_connection_string(conn_str="<connection_string consists of FileEndpoint=myFileEndpoint(https://storageaccountname.file.core.windows.net/);SharedAccessSignature=sasToken>", share_name="<file share name that you want to create>")

share.create_share()

This code is to upload a file into fileshare through databricks

from azure.storage.fileshare import ShareFileClient
 
file_client = ShareFileClient.from_connection_string(conn_str="<connection_string consists of FileEndpoint=myFileEndpoint(https://storageaccountname.file.core.windows.net/);SharedAccessSignature=sasToken>", share_name="<your_fileshare_name>", file_path="my_file")
 
with open("./SampleSource.txt", "rb") as source_file:
    file_client.upload_file(source_file)

Refer this link for further information https://pypi.org/project/azure-storage-file-share/

write/save Dataframe to azure file share from azure databricks

1 Answers1