10

I need to use HDFS cluster from remote desktop through Java API. Everything works OK until it comes to write access. If I'm trying to create any file I receive access permission exception. Path looks good but exception indicates my remote desktop user name which is of course is not what I need to access needed HDFS directory.

The question is: - Is there any way to represent different user name using 'simple' authentication in Java API? - Could you please point some good explanation of authentication / authorization schemes in hadoop / HDFS preferable with Java API examples?

Yes, I already know 'whoami' could be overloaded in this case using shell alias but I prefer to avoid solutions like this. Also specifics here is I dislike usage of some tricks like pipes through SSH and scripts. I'd like to perform everything using just Java API. Thank you in advance.

Roman Nikitchenko
  • 12,800
  • 7
  • 74
  • 110

1 Answers1

15

After some studying I came to the following solution:

  • I don't actually need the full Kerberos solution, it is enough currently that clients can run HDFS requests from any user. Environment itself is considered secure.
  • This gives me solution based on hadoop UserGroupInformation class. In future I can extend it to support Kerberos.

Sample code probably useful for people both for 'fake authentication' and remote HDFS access:

package org.myorg;

import java.security.PrivilegedExceptionAction;

import org.apache.hadoop.conf.*;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;

public class HdfsTest {

    public static void main(String args[]) {

        try {
            UserGroupInformation ugi
                = UserGroupInformation.createRemoteUser("hbase");

            ugi.doAs(new PrivilegedExceptionAction<Void>() {

                public Void run() throws Exception {

                    Configuration conf = new Configuration();
                    conf.set("fs.defaultFS", "hdfs://1.2.3.4:8020/user/hbase");
                    conf.set("hadoop.job.ugi", "hbase");

                    FileSystem fs = FileSystem.get(conf);

                    fs.createNewFile(new Path("/user/hbase/test"));

                    FileStatus[] status = fs.listStatus(new Path("/user/hbase"));
                    for(int i=0;i<status.length;i++){
                        System.out.println(status[i].getPath());
                    }
                    return null;
                }
            });
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Useful reference for those who have a similar problem:

  • Cloudera blog post "Authorization and Authentication In Hadoop". Short, focused on simple explanation of hadoop security approaches. No information specific to Java API solution but good for basic understanding of the problem.

UPDATE:
Alternative for those who uses command line hdfs or hadoop utility without local user needed:

 HADOOP_USER_NAME=hdfs hdfs fs -put /root/MyHadoop/file1.txt /

What you actually do is you read local file in accordance to your local permissions but when placing file on HDFS you are authenticated like user hdfs.

This has pretty similar properties to API code illustrated:

  1. You don't need sudo.
  2. You don't need actually appropriate local user 'hdfs'.
  3. You don't need to copy anything or change permissions because of previous points.
Roman Nikitchenko
  • 12,800
  • 7
  • 74
  • 110
  • I've stumbled upon the same problem as yours. I'm trying to send Hadoop job from a remote client to the cluster that will execute it. In my case the problem is that **Cloudera's Hadoop 2.0.0 (Hadoop 2.0.0-cdh4.3.1) doesn't provide UserGroupInformation class** that you've used. It seems that corresponding Apache Hadoop versions doesn't provide it neither. There is just an enum named UserGroupInformation - [link](http://archive.cloudera.com/cdh4/cdh/4/hadoop/api/org/apache/hadoop/security/UserGroupInformation.AuthenticationMethod.html). How could it be done in such a case then, in your opinion? – falconepl Sep 05 '13 at 11:39
  • It's there, just it's not cloudera. I'm using 2.0.0-cdh4.3.1 hadoop client right now. – Roman Nikitchenko Sep 05 '13 at 15:26
  • What do you mean by saying it's there? I've checked Apache Hadoop 2.0.6 API [[link](http://hadoop.apache.org/docs/r2.0.6-alpha/api/index.html)] as well as 2.1.0 API [[link](http://hadoop.apache.org/docs/r2.1.0-beta/api/index.html)] (those Javadocs that Apache provides on their website) and unfortunately there is no `UserGroupInformation` class, just the enum that doesn't help much. And by the way, isn't `2.0.0-cdh4.3.1` Hadoop that you've mentioned a Cloudera's Hadoop distribution? – falconepl Sep 05 '13 at 15:57
  • Main point here is: CDH4 actually supports 0.20 client which is recommended. Just look here: http://blog.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/ and then here: http://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_topic_5_2.html. As you can see they recommend to use 0.20 client. – Roman Nikitchenko Sep 05 '13 at 16:24
  • Ok, I see. If I got it right, CDH4.3 supports both v0.20.2 (MapReduce) and v2.0.0 (MapReduce 2 - YARN) versions - [link](http://www.cloudera.com/content/cloudera/en/products/cdh/projects-and-versions.html). The whole versioning thing is pretty obscure. But anyway, I still cannot find neither Hadoop API Javadoc nor hadoop-core JAR for CDH4.3 in Cloudera's repository [[link](https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hadoop/hadoop-core/)] that has a `UserGroupInformation` class. – falconepl Sep 05 '13 at 18:34
  • Honestly I always use 1.0.4 documentation and it looks good enough. For really though situations I just download CDH4 hadoop sources or javadoc. For example probably most needed things for you: http://maven.tempo-db.com/artifactory/list/cloudera/org/apache/hadoop/hadoop-client/2.0.0-cdh4.3.1/ and http://maven.tempo-db.com/artifactory/list/cloudera/org/apache/hadoop/hadoop-client/2.0.0-mr1-cdh4.3.1/ – Roman Nikitchenko Sep 05 '13 at 18:52
  • If you execute it like: `java -jar myjar.jar` File system will be LocalFileSystem. To get DistributedFileSystem execute your jar like: `hadoop jar myjar.jar` or `yarn jar myjar.jar` – Roman Kazanovskyi Nov 22 '17 at 09:02