0

I have a Hadoop cluster with Kerberos enabled I want to put files on HDFS using a windows/linux machine outside the cluster.

Hadoop admin team have provided me with username to access hadoop and keytab file, how should I use them in my java code?

I went through many resources on internet but none of them give any guide for accessing kerberized hadoop from outside the cluster.

Also, Is it necessary to run the code using hadoop jar? if yes how will I run it fromo outside the cluster

Reference

http://blog.rajeevsharma.in/2009/06/using-hdfs-in-java-0200.html
http://princetonits.com/technology/using-filesystem-api-to-read-and-write-data-to-hdfs/

I got kerberos working ,able to generate ticket now

But curl is not working(windows)

curl -i  --negotiate u:qjdht93 "http://server:50070/webhdfs/v1/user/qjdht93/?op=LISTSTATUS"

Gives error as

HTTP/1.1 401 Authentication required
Cache-Control: must-revalidate,no-cache,no-store
Date: Mon, 01 Jun 2015 15:26:37 GMT
Pragma: no-cache
Date: Mon, 01 Jun 2015 15:26:37 GMT
Pragma: no-cache
Content-Type: text/html; charset=iso-8859-1
WWW-Authenticate: Negotiate
Set-Cookie: hadoop.auth=; Version=1; Path=/; Expires=Thu, 01-Jan-1970 00:00:00 G
MT; HttpOnly


Content-Length: 1416
Server: Jetty(6.1.26)

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 401 Authentication required</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /webhdfs/v1/user/qjdht93. Reason:
<pre>    Authentication required</pre></p><hr /><i><small>Powered by Jetty://</s
mall></i><br/>
<br/>
<br/>

Please suggest

Chhaya Vishwakarma
  • 1,407
  • 9
  • 44
  • 72
  • From the other comments it looks like you have your answer but I wanted to add something for future readers. If you need to use Kerberos to access a foreign cluster you 1) need to make sure you have an entry in /etc/krb5.conf that tells your software where to find the KDC for the realm, and 2) use kinit -kt to verify that your Kerberos credentials are valid. Do that first and you'll save yourself a world of pain when trying to figure out why something isn't working since the error messages are often obscure. – bgiles Oct 04 '18 at 21:50

1 Answers1

-2

This can be achieve through hdfs command. All you need is hadoop distribution and configuration files which present on namenode.

  1. Copy the hadoop distribution on the client node. It means you have to copy the complete hadoop package to the client machine. Refer this
  2. Get a user ticket from keytab using kinit command which is a command line tool for java.
    a. Install jdk in you client machine.
    b. Set JAVA_HOME, see here
    c. Create a krb5.ini file in the location C:\windows\krb5.ini. This file should contain below information,
    [libdefaults]
        default_realm = REALM
    [realms]
        REALM = {
            kdc = kdcvalue    
            admin_server = kdcvalue 
            default_domain = kdcvalue 
        }
    [domain_realm]
        .kdcvalue = REALM
        kdcvalue = REALM

REALM - Server Realm name
kdcvalue - Server host name or ip address

d. Make sure java bin path set in the PATH variable in windows machine. Open command prompt, Type the below command to get user ticket.

kinit -k -t keytabfile username
  1. Now you can able to put the file into HDFS using "hadoop fs -put src dest" or using java.
Community
  • 1
  • 1
Kumar
  • 3,782
  • 4
  • 39
  • 87
  • 1.Copy the hadoop distribution on the client node---what exactly I have to copy? 2. Can you please explain how to get ticket using keytab in java it will be helpful ? as I'm not java expert – Chhaya Vishwakarma May 26 '15 at 09:38
  • hey thanks this is really helful ...will client machine not become as good as edge node ?...webhdfs can be called using web browser also right?...if I'm using webhdfs in my java code then also i need hadoop package copied? – Chhaya Vishwakarma May 26 '15 at 15:29
  • If you are using webhdfs then no need to have hadoop package in client machine. Refer [WebHDFS REST API](https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html). – Kumar May 27 '15 at 05:43
  • You can find a similar project here. Download source and get some idea and try your own. [webhdfs-java-client](https://github.com/zxs/webhdfs-java-client) – Kumar May 27 '15 at 10:29
  • Yeah sure. But for writing and debugging the code, Use an Java IDE such as eclipse, Netbeans, etc. IDE is more user friendly than the command line. – Kumar May 28 '15 at 03:40
  • kinit is a java tool. You have to install java in your machine to run kinit command. – Kumar May 28 '15 at 08:47
  • I set the proeprties in krb5.ini as suggested by you but getting below error kinit: Configuration file does not specify default realm when parsing name – Chhaya Vishwakarma May 28 '15 at 09:09
  • Set the REALM in capital letters. Make sure you have set correct realm and kdc name of your server machine. – Kumar May 28 '15 at 09:11
  • Got kerberos working..Please check my edits in question and give some pointers,,I'm unable to get anything on the error It will be great if you can help – Chhaya Vishwakarma Jun 01 '15 at 15:32
  • It should be authenticate to use webhdfs. Try to get ticket using kinit command and then try the curl. – Kumar Jun 02 '15 at 04:04
  • I successfully used kinit then gave password, then too its giving authentication error...after kinit i did klist --not showing any granted ticket – Chhaya Vishwakarma Jun 02 '15 at 07:32
  • For klist, you can change directory to %JAVA_HOME%\bin then execute klist command. You will get details. For WebHDtFS, refer [this](http://hadoop.apache.org/docs/stable1/webhdfs.html#Authentication) – Kumar Jun 02 '15 at 08:37
  • I have installed MIT kerberos client 4.0.1..i looked into the link same curl command works on linux but failing on windows...is there any different command that i need to use for windows? – Chhaya Vishwakarma Jun 02 '15 at 09:23
  • First check whether you can able to access namenode web ui from windows browser. There you can open webhdfs to list directories. If you can able to view in browser then you can do it from curl also. – Kumar Jun 02 '15 at 09:26
  • below is the error I'm getitng on ui and on cmd....Permission denied when trying to open /webhdfs/v1/?op=LISTSTATUS: GSSException: Defective token detected (Mechanism level: GSSHeader did not find the right tag)....I'm able to generate kerberos ticket then why I'm getting permission denied? – Chhaya Vishwakarma Jun 02 '15 at 09:30
  • You have to add kdc using "ksetup". Execute the below command in command prompt with admin privilege.
    ksetup /addkdc REALM HOST
    replace REALM and HOST with correct value.
    – Kumar Jun 02 '15 at 09:38
  • This step required only in the windows machine where you want to access webhdfs. – Kumar Jun 02 '15 at 09:46
  • There is no link available. I have implemented hadoop security in windows. So that guiding you with my experience. – Kumar Jun 02 '15 at 12:50
  • HI Kumar thanks for the help...I added kdc as suggested by you , but still i'm getitng authentication error :( do i need to chnage domain also?..i have windows 2012 server – Chhaya Vishwakarma Jun 08 '15 at 09:31
  • are you using MS windows Kerberos or non windows kerberos? – Chhaya Vishwakarma Jun 08 '15 at 09:45
  • I am using Active Directory windows server 2012. What problem you are facing actually now? – Kumar Jun 08 '15 at 09:47
  • I'm able to get the ticket using kinit and klist also working..but when i run curl from cmd i get error as HTTP/1.1 401 Authentication required...I'm doing anything wron in curl? as I'm able to get kerberos ticket..I have also pasted curl command I'm using in my question – Chhaya Vishwakarma Jun 08 '15 at 10:20
  • Windows has its own Kerberos, is it that causing problem? – Chhaya Vishwakarma Jun 11 '15 at 12:24
  • It might be causing problem. But I don't have any idea about the curl command. I haven't tried it before. If you able to access webhdfs from browser then surely you can access from curl too. – Kumar Jun 12 '15 at 03:51
  • @chhayavishwakarma, did you find the working solution? – Dinesh Kumar P Jun 15 '15 at 09:58
  • not yet :(....trying but no luck...I'm usinng diffeerent ID(311122) to login to windows and different ID(qjth34) to authenticate to kerberos can this be problem? – Chhaya Vishwakarma Jun 15 '15 at 11:10
  • @chhayavishwakarma are you using ip address or hostname in the url can you try this curl -i --negotiate -u: "http://:50070/webhdfs/v1/?op=LISTSTATUS" – Sachin Janani Apr 04 '16 at 07:30