25

I am using saveAsTextFile() to store the results of a Spark job in the folder dbfs:/FileStore/my_result.

I can access to the different "part-xxxxx" files using the web browser, but I would like to automate the process of downloading all files to my local machine.

I have tried to use cURL, but I can't find the RestAPI command to download a dbfs:/FileStore file.

Question: How can I download a dbfs:/FileStore file to my Local Machine?

I am using Databricks Community Edition to teach an undergraduate module in Big Data Analytics in college. I have Windows 7 installed in my local machine. I have checked that cURL and the _netrc files are properly installed and configured as I manage to successfully run some of the commands provided by the RestAPI.

Thank you very much in advance for your help! Best regards, Nacho

Nacho Castiñeiras
  • 303
  • 1
  • 4
  • 7

6 Answers6

31

There are a few options for downloading FileStore files to your local machine.

Easier options:

  • Install the Databricks CLI, configure it with your Databricks credentials, and use the CLI's dbfs cp command. For example: dbfs cp dbfs:/FileStore/test.txt ./test.txt. If you want to download an entire folder of files, you can use dbfs cp -r.
  • From a browser signed into Databricks, navigate to https://<YOUR_DATABRICKS_INSTANCE_NAME>.cloud.databricks.com/files/. If you are using Databricks Community Edition then you may need to use a slightly different path. This download method described in more detail in the FileStore docs.

Advanced options:

  • Use the DBFS REST API. You can access file contents using the read API call. To download a large file, you may need to issue multiple read calls to access chunks of the full file.
Josh Rosen
  • 13,511
  • 6
  • 58
  • 70
  • Hi Josh, Thank you so much, your answer fixed the issue. I can access now to the files using the CLI. Thank you so much again! Nacho – Nacho Castiñeiras Mar 01 '18 at 12:23
  • Hi @Nacho Castiñeiras Can you please share the steps? I logged in but do not see "Access Token" tab in "User Settings", I am using Community Databricks". Thanks – mdivk Feb 18 '19 at 18:16
  • Hi @NachoCastiñeiras, I too would be interested to know, how exactly you got the CLI working with the Databricks Community Edition. I am using it for my Big Data Analytics seminar too ;-) – Nicolas Jun 07 '19 at 08:27
6

Quickest way to download a small file from Databricks DBFS community edition:

When you visit Databricks community edition, the link looks like : https://community.cloud.databricks.com/?o=<unique ID>

Upload your file to folder FileStore > tables. Then the download link will look like: https://community.cloud.databricks.com/files/tables/sample_file.csv?o=<your unique ID>

Last tested on 17-AUG-2021

Kent Pawar
  • 2,378
  • 2
  • 29
  • 42
3

Using browser, you can access to individual file in File Store. You cannot access or even list directories. So you first have to put some file into the file store. If you've got a file "example.txt" at "/FileStore/example_directory/", you can download it via the following URL:

https://community.cloud.databricks.com/files/example_directory/example.txt?o=###

In that URL, "###" has to be replaced by the long number you find at the end of your community edition URL (after you logged into your community edition account).

Add comment · Share

  • 1
    This is not helpful. I am able to see the file in the Filestore but not able to download locally. – Ceren Feb 10 '21 at 18:10
3

The easiest way for me was to use display method. enter image description here

Ref: https://www.learntospark.com/2021/04/download-data-from-dbfs-to-local.html

Modem Rakesh goud
  • 1,578
  • 1
  • 12
  • 11
0

You can use the link like below and it worked fine for me. Even we download big files too. You can use the link like this example below:

https://northeurope.azuredatabricks.net/files/<exact_file_path_with_folder(if_any)>?o=<your_unique_id>

e.g. https://northeurope.azuredatabricks.net/files/shared_uploads/mydocuments/data.json>?o=89898989899

0

I assume you have already configured Databricks CLI and its running.

Note : - We can either use 'databricks fs' or 'dbfs' interchangeably.

Working with databricks file

  1. Copy file from databricks to your local machine

    dbfs cp < databricks path > < local repository path >

    Example :- dbfs cp dbfs:/FileStore/PricingData/Output.csv C:\Users\Waqar\Desktop\Output

  2. Copy file from local machine to databricks

    dbfs cp < local machine file path with file name > < databricks folder path >

    Example :- dbfs cp C:\Users\Waqar\Desktop\Output\data.csv dbfs:/FileStore/PricingData

Working with databricks directory

  1. Copy directory from databricks to your local machine

    dbfs cp -r < databricks directory path > < local directory path >

    Example :- dbfs cp -r dbfs:/FileStore/PricingData C:\Users\Waqar\Desktop\Output

  2. Copy directory from local machine to databricks

    dbfs cp -r < local machine directory path > < databricks directory path >

    Example :- dbfs cp -r C:\Users\Waqar\Desktop\Output dbfs:/FileStore/PricingData

Note You will have to modify your databricks path in every case. you might have to change

/dfbs/FileStore/PricingData to dbfs:/FileStore/PricingData and so on....

Waqar Khan
  • 21
  • 2