Questions tagged [webhdfs]

WebHDFS is a REST API that supports the complete FileSystem interface for HDFS (Hadoop Distributed File System)

WebHDFS is a REST API that supports the complete FileSystem interface for HDFS (Hadoop Distributed File System). This Api is used to establish a connection to the Hadoop Data Lake from a third-party tool such as SSIS: Using WebHDFS to connect Hadoop Data Lake to SSIS

268 questions
43
votes
3 answers

When using --negotiate with curl, is a keytab file required?

The documentation describing how to connect to a kerberos secured endpoint shows the following: curl -i --negotiate -u : "http://:/webhdfs/v1/?op=..." The -u flag has to be provided but is ignored by curl. Does the --negotiate…
Chris Snow
  • 23,813
  • 35
  • 144
  • 309
22
votes
1 answer

WebHDFS vs HttpFS

What is the difference between the WebHDFS REST API and HttpFS? If I understand correctly: HttpFS is an independent service that exposes a REST API on top of HDFS WebHDFS is a REST API built-into HDFS. It doen't require any further installation…
Santiago Cepas
  • 4,044
  • 2
  • 25
  • 31
14
votes
1 answer

Is there any way to download a HDFS file using WebHDFS REST API?

Is there any way by which I can download a file from HDFS using WebHDFS REST API?The closest I have reached is to use the open operation to read the file and save the content. curl -i -L "http://localhost:50075/webhdfs/v1/demofile.txt?op=OPEN" -o…
Tariq
  • 34,076
  • 8
  • 57
  • 79
10
votes
4 answers

Hdfs put VS webhdfs

I'm loading 28 GB file in hadoop hdfs using webhdfs and it takes ~25 mins to load. I tried loading same file using hdfs put and It took ~6 mins. Why there is so much difference in performance? What is recommended to use? Can somebody explain or…
Chhaya Vishwakarma
  • 1,407
  • 9
  • 44
  • 72
9
votes
2 answers

Connect to HDFS with Kerberos Authentication using Python

I am trying to connect to HDFS protected with Kerberos authentication. I have following details but dont know how to proceed. User Password Realm HttpFs Url I tried below code but getting Authentication error: from hdfs.ext.kerberos import…
ankit
  • 1,499
  • 5
  • 29
  • 46
9
votes
2 answers

Namenode high availability client request

Can anyone please tell me that If I am using java application to request some file upload/download operations to HDFS with Namenode HA setup, Where this request go first? I mean how would the client know that which namenode is active? It would be…
user2846382
  • 385
  • 1
  • 3
  • 16
8
votes
1 answer

HADOOP / YARN - Are the ResourceManager and the hdfs NameNode always installed on the same host?

Are the “resource manager” and the “hdfs namenode” always installed on the same host? 1) When I want to send an http request (YARN REST API) to get new application id I am using this web uri: http://
Xquery
  • 187
  • 1
  • 3
  • 9
8
votes
2 answers

Which nodejs library should I use to write into HDFS?

I have a nodejs application and I want to write data into hadoop HDFS file system. I have seen two main nodejs libraries that can do it: node-hdfs and node-webhdfs. Someone have tried it? Any hints? Which one should I use in production? I am…
user3161639
  • 99
  • 1
  • 3
7
votes
0 answers

Internet Explorer always using NTLM instead of Kerberos

I am trying to browse my HDFS system from internet explorer but for some reason it is always using NTLM instead of Kerberos, so I receive the message GSSException: Defective token detected (Mechanism level: GSSHeader did not find the right…
6
votes
1 answer

How can I Read and Transfer chunks of file with Hadoop WebHDFS?

I need to transfer big files (at least 14MB) from the Cosmos instance of the FIWARE Lab to my backend. I used the Spring RestTemplate as a client interface for the Hadoop WebHDFS REST API described here but I run into an IO Exception: Exception in…
Andrea Sassi
  • 101
  • 5
5
votes
1 answer

Docker Kerberos WebHDFS AuthenticationException: Unauthorized

I have a Spring application that reads a file from HDFS using WebHDFS. When I test it in IDEA, it works. But after I build the project and deploy the Docker image on a virtual machine locally or on a server connected to HDFS, I…
Evgenii
  • 389
  • 3
  • 7
  • 21
5
votes
4 answers

ConnectionError(MaxRetryError("HTTPConnectionPool Max retries exceeded using pywebhdfs

Hi i am using pywebhdfs python lib. i am connecting EMR by calling and trying to create file on HDFS. I am getting below exception which seems irrelevant against what i am performing as i am not hitting any connection limit here. is it due to how…
Sam
  • 1,333
  • 5
  • 23
  • 36
5
votes
1 answer

WebHDFS not working on a secure hadoop cluster

I am trying to secure my HDP2 Hadoop cluster using Kerberos. So far Hdfs, Hive, Hbase, Hue Beeswax and Hue Job/task browsers are working properly ; however Hue's File Browser is not working, it answers : WebHdfsException at…
Arnaud
  • 273
  • 1
  • 3
  • 15
5
votes
1 answer

Spring support for WebHDFS

Is there any Spring support for wedhdfs? I didnt find any useful link on google. I want to connect to hadoop with normal authentication and kerberos authentication via webhdfs. Is this supported in spring? Any useful links will be helpful. Thanks
user608020
  • 313
  • 4
  • 15
4
votes
4 answers

Know the disk space of data nodes in hadoop?

Is there a way or any command using which I can come to know the disk space of each datanode or the total cluster disk space? I tried the command dfs -du -h / but it seems that I do not have permission to execute it for many directories and hence…
Djeah
  • 320
  • 8
  • 21
1
2 3
17 18