1

I'm trying to copy 20GB files from one folder to another folder in Azure Data Lake and want to achieve it through Data Bricks. I have tried the below code but it is taking more then an hour. Can anyone suggest me how to achieve this with less then 20 minutes?

import shutil, os
shutil.copytree("/dbfs/mnt/storage1/ABC/", "/dbfs/mnt/storage1/copied/")
user1251007
  • 15,891
  • 14
  • 50
  • 76
Programmer
  • 11
  • 3
  • Possible duplicate of [Python: How to Copy Files Fast](https://stackoverflow.com/questions/22078621/python-how-to-copy-files-fast) –  Mar 05 '19 at 17:59

2 Answers2

2

Best option would be using dbutils.fs.

This would do it for you:

 dbutils.fs.cp ("/mnt/storage1/ABC/", "/mnt/storage1/copied/", recurse=True)
Hauke Mallow
  • 2,887
  • 3
  • 11
  • 29
0

Trying using the azure.datalake.store library, more details here: https://github.com/Azure/azure-data-lake-store-python

That should prevent databricks download and re-uploading the file.

simon_dmorias
  • 2,343
  • 3
  • 19
  • 33