9

Can anyone please tell me that If I am using java application to request some file upload/download operations to HDFS with Namenode HA setup, Where this request go first? I mean how would the client know that which namenode is active?

It would be great if you provide some workflow type diagram or something that explains request steps in detail(start to end).

mrsrinivas
  • 34,112
  • 13
  • 125
  • 125
user2846382
  • 385
  • 1
  • 3
  • 16

2 Answers2

12

Please check Namenode HA architecture with key entities in HDFS client requests handling.

HA architecture

Where this request go first? I mean how would client know that which namenode is active?

For client/driver it doesn't matter which namenode is active. because we query on HDFS with nameservice id rather than hostname of namenode. nameservice will automatically transfer client requests to active namenode.

Example: hdfs://nameservice_id/rest/of/the/hdfs/path

Explanation:

How this hdfs://nameservice_id/ works and what are the confs involved in it?

In hdfs-site.xml file

Create a nameservice by adding an id to it(here nameservice_id is mycluster)

<property>
  <name>dfs.nameservices</name>
  <value>mycluster</value>
  <description>Logical name for this new nameservice</description>
</property>

Now specify namenode ids to determine namenodes in cluster

dfs.ha.namenodes.[$nameservice ID]:

<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>nn1,nn2</value>
  <description>Unique identifiers for each NameNode in the nameservice</description>
</property>

Then link namenode ids with namenode hosts

dfs.namenode.rpc-address.[$nameservice ID].[$name node ID]

<property>
  <name>dfs.namenode.rpc-address.mycluster.nn1</name>
  <value>machine1.example.com:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  <value>machine2.example.com:8020</value>
</property>

After that specify the Java class that HDFS clients use to contact the Active NameNode so that DFS Client uses this class to determine which NameNode is currently serving client requests.

<property>
  <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

Finally HDFS URL will be like this after these configuration changes.

hdfs://mycluster/<file_lication_in_hdfs>

To answer your question I have taken few configuration only. please check the detailed documentation for how does Namenodes, Journalnodes and Zookeeper machines form Namenode HA in HDFS.

mrsrinivas
  • 34,112
  • 13
  • 125
  • 125
  • I am using java as application and need some configuration object related guidance in HDFS federated cluster. How all active namenodes can share all the incoming client requests horizontally. For this I need some code example in java – user2846382 Nov 07 '16 at 14:17
9

If hadoop cluster is configured with HA, then it will have namenode IDs in hdfs-site.xml like this :

<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>namenode1,namenode2</value>
</property>

Whichever NameNode is started first will become active. You may choose to start the cluster in a specific order such that your preferred node starts first.

If you want to determine the current status of namenode, you can use getServiceStatus() command :

hdfs haadmin -getServiceState <machine-name>

Well, while writing the driver class, you need to set the following properties in configuration object:

 public static void main(String[] args) throws Exception {
    if (args.length != 2){
        System.out.println("Usage: pgm <hdfs:///path/to/copy> </local/path/to/copy/from>");
        System.exit(1);
    }
    Configuration conf = new Configuration(false);
    conf.set("fs.defaultFS", "hdfs://nameservice1");
    conf.set("fs.default.name", conf.get("fs.defaultFS"));
    conf.set("dfs.nameservices","nameservice1");
    conf.set("dfs.ha.namenodes.nameservice1", "namenode1,namenode2");
    conf.set("dfs.namenode.rpc-address.nameservice1.namenode1","hadoopnamenode01:8020");
    conf.set("dfs.namenode.rpc-address.nameservice1.namenode2", "hadoopnamenode02:8020");
    conf.set("dfs.client.failover.proxy.provider.nameservice1","org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");

    FileSystem fs =  FileSystem.get(URI.create(args[0]), conf);
    Path srcPath = new Path(args[1]);
    Path dstPath = new Path(args[0]);
    //in case the same file exists on remote location, it will be overwritten
    fs.copyFromLocalFile(false, true, srcPath, dstPath);
}

Request will go to the nameservice1 and further handled by Hadoop cluster as per the namenode status(active/standby).

For more details, please refer the HDFS High availability

Nishu Tayal
  • 20,106
  • 8
  • 49
  • 101
  • I think you didn't understand my question properly. I am using java application to make request..My question is on which namenode I do my requests. For that I need to know which namenode is active – user2846382 Mar 10 '16 at 13:55
  • @user2846382: you need to set the configuration in the driver class. Please refer the updated answer. – Nishu Tayal Mar 10 '16 at 17:40
  • @user2846382: please accept the answer, if it solves your issue. – Nishu Tayal Mar 11 '16 at 17:29
  • @NishuTayal : need some configuration object related guidance in HDFS federated cluster. How all active namenodes can share all the incoming client requests horizontally. For this I need some code example in java – user2846382 Nov 07 '16 at 14:14