Is there any way to download a HDFS file using WebHDFS REST API?

Question

Is there any way by which I can download a file from HDFS using WebHDFS REST API?The closest I have reached is to use the open operation to read the file and save the content.

curl -i -L "http://localhost:50075/webhdfs/v1/demofile.txt?op=OPEN" -o ~/demofile.txt

Is there any API that will allow me to download the file directly without having to open it?I went through the official document and tried Google as well, but could not find anything. Could somebody point me in the right direction or provide me some pointers?

Thank you so much for your valuable time.

What is wrong with the approach you're describing? You'll need to read the file at some point anyway if you want to download it locally. — Charles Menguy, May 31 '13 at 21:25
Thank you for the reply sir. I just want to download the file as it is and keep it into a directory on my local FS as of now. Reading the file is not my intention at this moment. Also, if I follow the above approach I would end up with a file which includes the header as well "HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 218 Server: Jetty(6.1.26)" — Tariq, May 31 '13 at 21:38
The webHDFS API is for programmatic use, so using OPEN is as close as it gets if you want to use it... you still need some code to create the file. — Christophe Roussy, Feb 15 '16 at 12:33
Not really sure how exactly this question is off-topic. Discussing APIs is what SO is meant for. — Tariq, Aug 30 '17 at 08:15
The API call looks perfectly fine - unless the question is updated with a good reason for why anyone should care, everyone probably is feeling like the OP is wasting our time with an useless question. — Paulo Scardine, Dec 14 '17 at 14:42
Sometimes reading the question with an open mind really helps. I have very clearly mentioned in my question that the API call I have shown here is the closest and certainly not exactly what I intend to achieve. I have been an active contributor on SO for years and I very well know what exactly wasting time is. 'Fine' is a relative word. What's fine with you might not be so fine with me. You can look at my last comment against the answer to verify that. The way I was doing it was not 'fine' and I had to change something to make it work. — Tariq, Aug 29 '18 at 18:33
And as far as the fact whether or not you wish to help fellow SO users is concerned, it's totally your call. 8 Upvotes and 3 stars are a fine indication of how useful/useless this question is. — Tariq, Aug 29 '18 at 18:35
The headers are included, because the -i flag includes headers. Remove that, and you should have the "reference implementation". — Rick Moritz, Jul 11 '19 at 17:09
@Tariq I'm flagging this to be an open question. As a Hadoop administrator, these topics are not always cut and dry approaches, and most of the default documentation leaves out key elements or details. This post should be open for future answers and discussion around the webhdfs API (10k views says it all) — Petro, Nov 29 '19 at 17:26

Charles Menguy · Accepted Answer · 2013-06-03T18:20:09.827

11

You could probably use the DataNode API for this (default on port 50075), it supports a streamFile command which you could take advantage of. Using wget this would look something like:

wget http://$datanode:50075/streamFile/demofile.txt -O ~/demofile.txt

Note that this command needs to be executed on the datanode itself, not on the namenode !

Alternatively, if you don't know which datanode to hit you could ask the jobtracker and it will redirect you to the right datanode with this URL:

http://$namenode:50070/data/demofile.txt

edited Jun 03 '13 at 18:20

answered May 31 '13 at 22:22

Charles Menguy

40,830
17
95
117

thank you for the reply sir. i had tried this once but it was giving me "ERROR 500: File does not exist: /.". – Tariq Jun 02 '13 at 21:37
Can you show me what command you ran? – Charles Menguy Jun 02 '13 at 23:23
wget http://localhost:50075/streamFile?filename=/demofile.txt -O ~/demofile.txt – Tariq Jun 02 '13 at 23:27
What happens if you do filename=demofile.txt instead of filename=/demofile.txt ? – Charles Menguy Jun 02 '13 at 23:28
i'm getting the same error – Tariq Jun 02 '13 at 23:30
Weird, i'll try this this Monday and let you know what I find, if the file exists this should download the file for you. – Charles Menguy Jun 02 '13 at 23:34
exactly..i was expecting the same..i'll also try and let you know if something clicks..thanks again. – Tariq Jun 02 '13 at 23:35
and the file does exist with proper permissions. i have checked that twice. – Tariq Jun 02 '13 at 23:36
1

@Tariq Edited my answer with more details, and it looks like you actually don't use "filename=", but put the path file directly after streamFile. – Charles Menguy Jun 03 '13 at 18:20
thank you so very much sir. we actually don't need "-O ~/demofile.txt". simply running "wget http://$datanode:50075/streamFile/demofile.txt" would do the trick. thanks again. – Tariq Jun 03 '13 at 18:53
Is there anyway we can download multiple files without knowing the file names only knowing folder name ? – user1570210 Oct 03 '14 at 01:41
Do i need to give a user password reading files with webhdfs in java ? – 2Big2BeSmall Nov 26 '15 at 11:32
As of Hadoop 3.0.0 port 50075 has been moved to 9870. – anegru Dec 22 '19 at 21:52

Is there any way to download a HDFS file using WebHDFS REST API?

1 Answers1

Linked