14

Is there any way by which I can download a file from HDFS using WebHDFS REST API?The closest I have reached is to use the open operation to read the file and save the content.

curl -i -L "http://localhost:50075/webhdfs/v1/demofile.txt?op=OPEN" -o ~/demofile.txt

Is there any API that will allow me to download the file directly without having to open it?I went through the official document and tried Google as well, but could not find anything. Could somebody point me in the right direction or provide me some pointers?

Thank you so much for your valuable time.

Tariq
  • 34,076
  • 8
  • 57
  • 79
  • What is wrong with the approach you're describing? You'll need to read the file at some point anyway if you want to download it locally. – Charles Menguy May 31 '13 at 21:25
  • 1
    Thank you for the reply sir. I just want to download the file as it is and keep it into a directory on my local FS as of now. Reading the file is not my intention at this moment. Also, if I follow the above approach I would end up with a file which includes the header as well "HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 218 Server: Jetty(6.1.26)" – Tariq May 31 '13 at 21:38
  • The webHDFS API is for programmatic use, so using OPEN is as close as it gets if you want to use it... you still need some code to create the file. – Christophe Roussy Feb 15 '16 at 12:33
  • 3
    Not really sure how exactly this question is off-topic. Discussing APIs is what SO is meant for. – Tariq Aug 30 '17 at 08:15
  • The API call looks perfectly fine - unless the question is updated with a good reason for why anyone should care, everyone probably is feeling like the OP is wasting our time with an useless question. – Paulo Scardine Dec 14 '17 at 14:42
  • Sometimes reading the question with an open mind really helps. I have very clearly mentioned in my question that the API call I have shown here is the closest and certainly not exactly what I intend to achieve. I have been an active contributor on SO for years and I very well know what exactly wasting time is. 'Fine' is a relative word. What's fine with you might not be so fine with me. You can look at my last comment against the answer to verify that. The way I was doing it was not 'fine' and I had to change something to make it work. – Tariq Aug 29 '18 at 18:33
  • And as far as the fact whether or not you wish to help fellow SO users is concerned, it's totally your call. 8 Upvotes and 3 stars are a fine indication of how useful/useless this question is. – Tariq Aug 29 '18 at 18:35
  • The headers are included, because the -i flag includes headers. Remove that, and you should have the "reference implementation". – Rick Moritz Jul 11 '19 at 17:09
  • 1
    @Tariq I'm flagging this to be an open question. As a Hadoop administrator, these topics are not always cut and dry approaches, and most of the default documentation leaves out key elements or details. This post should be open for future answers and discussion around the webhdfs API (10k views says it all) – Petro Nov 29 '19 at 17:26

1 Answers1

11

You could probably use the DataNode API for this (default on port 50075), it supports a streamFile command which you could take advantage of. Using wget this would look something like:

wget http://$datanode:50075/streamFile/demofile.txt -O ~/demofile.txt

Note that this command needs to be executed on the datanode itself, not on the namenode !

Alternatively, if you don't know which datanode to hit you could ask the jobtracker and it will redirect you to the right datanode with this URL:

http://$namenode:50070/data/demofile.txt
Charles Menguy
  • 40,830
  • 17
  • 95
  • 117