5

Im configuring Hadoop 2.2.0 stable release with HA namenode but i dont know how to configure remote access to the cluster.

I have HA namenode configured with manual failover and i defined dfs.nameservices and i can access hdfs with nameservice from all the nodes included in the cluster, but not from outside.

I can perform operations on hdfs by contact directly the active namenode, but i dont want that, i want to contact the cluster and then be redirected to the active namenode. I think this is the normal configuration for a HA cluster.

Does anyone now how to do that?

(thanks in advance...)

BAndrade
  • 107
  • 1
  • 8
  • On your client node, have you configured core-site.xml with the appropriate properties (fs.default.name for the cluster name, and ha.zookeeper.quorum for the list of ZK servers)? Can you post your current core-site.xml back into your original question. – Chris White Nov 01 '13 at 21:36
  • i am accessing hdfs via webHDFS so i don't have hadoop installed on my client node. – BAndrade Nov 04 '13 at 10:02
  • no one have an opinion to share about this question? – BAndrade Nov 15 '13 at 16:10
  • Also see http://stackoverflow.com/questions/26648214/any-command-to-get-active-namenode-for-nameservice-in-hadoop for info on how to figure out which is the active namenode. – Erik Forsberg Sep 29 '15 at 09:46

4 Answers4

2

You have to add more values to the hdfs site:

<property>
    <name>dfs.ha.namenodes.myns</name>
    <value>machine-98,machine-99</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.myns.machine-98</name>
    <value>machine-98:8100</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.myns.machine-99</name>
    <value>machine-145:8100</value>
</property>

<property>
    <name>dfs.namenode.http-address.myns.machine-98</name>
    <value>machine-98:50070</value>
</property>

<property>
    <name>dfs.namenode.http-address.myns.machine-99</name>
    <value>machine-145:50070</value>
</property>
Dragonborn
  • 1,755
  • 1
  • 16
  • 37
1

You need to contact one of the Name nodes (as you're currently doing) - there is no cluster node to contact.

The hadoop client code knows the address of the two namenodes (in core-site.xml) and can identity which is the active and which is the standby. There might be a way by which you can interrogate a zookeeper node in the quorum to identify the active / standby (maybe, i'm not sure) but you might as well check one of the namenodes - you have a 50/50 chance it's the active one.

I'd have to check, but you might be able to query either if you're just reading from HDFS.

Chris White
  • 29,949
  • 4
  • 71
  • 93
  • first of all, thanks for the reply. Im using pywebhdfs, so he don't use the core-site.xml file. What I'm doing is... store the two namenode addresses. If an operation fails, i look on the exception for the "Standbyexception", if exists, i try the other namenode... – BAndrade Nov 26 '13 at 10:08
1

for Active Name node you can always ask Zookeeper. you can get the active name node from the below Zk Path.

/hadoop-ha/namenodelogicalname/ActiveStandbyElectorLock 
Zach Saucier
  • 24,871
  • 12
  • 85
  • 147
1

There are two ways to resolve this situation(code with java)

  1. use core-site.xml and hdfs-site.xml in your code

    load conf via addResource

  2. use conf.set in your code

    set hadoop conf via conf.set

    an example use conf.set

xluren
  • 119
  • 1
  • 6