Move Files from Azure Files to ADLS Gen 2 and Back using Databricks

Question

I have a Databricks process which currently generate a bunch of text files which gets stored in Azure Files. These files need to be moved to ADLS Gen 2 on a scheduled basis and back to File Share.

How this can be achieved using Databricks?

What have you tried? Here's an explanation of how to mount Azure Files to databricks. https://learn.microsoft.com/en-us/answers/questions/133702/read-files-from-azure-file-share-using-databricks.html But I don't suggest you do it this way. Instead work out which web API will let you copy files directly (rather than reading into dataframes and writing back), and call that web API. In fact if you are simply copying files, I suggest you don't use databricks, use something simpler to copy the files like Azure Automation or Azure Functions or possibly Azure Data Factory — Nick.Mc, Aug 17 '21 at 03:39
You could in the first instance try using AzCopy, but you'll need to get the syntax exactly right. https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-files#copy-files-between-storage-accounts — Nick.Mc, Aug 17 '21 at 03:45
Here's a Azure Automation example that does what you want. https://charbelnemnom.com/sync-between-azure-file-share-and-azure-blob-container/ I suggest before asking a question you do some googling and try a few things. — Nick.Mc, Aug 17 '21 at 03:47

score 0 · Answer 1 · answered Aug 17 '21 at 07:56

Installing the azure-storage package and using the Azure Files SDK for Python on Azure Databricks is the only way to access files in Azure Files.

Install Library: file-share azure-storage https://pypi.org/project/azure-storage-file-share/

Note : Pip install only instals the package on the driver node, thus pandas must be loaded first. The library must be deployed as a Databricks Library before it can be used by Spark worker nodes.

Python - Load file from Azure Files to Azure Databricks - Stack Overflow

Alternative could be copying the data from Azure File Storage to ADLS2 via Azure DataFactory using Copy activity : Copy data from/to Azure File Storage - Azure Data Factory & Azure Synapse | Microsoft Docs

Move Files from Azure Files to ADLS Gen 2 and Back using Databricks

1 Answers1