2

I have a HBase + HDFS setup, in which each of the HBase master, regionservers, HDFS namenode and datanodes are containerized.

When running all of these containers on a single host VM, things work fine as I can use the docker container names directly, and set configuration variables as:

CORE_CONF_fs_defaultFS: hdfs://namenode:9000

for both the regionserver and datanode. The system works as expected in this configuration.

When attempting to distribute these across multiple host VMs however, I run into issue.

I updated the config variables above to look like:

CORE_CONF_fs_defaultFS: hdfs://hostname:9000

and make sure the namenode container is exposing port 9000 and mapping it to the host machine's port 9000.

It looks like the names are not resolving correctly when I use the hostname, and the error I see in the datanode logs looks like:

2019-08-24 05:46:08,630 INFO impl.FsDatasetAsyncDiskService: Deleted BP-1682518946-<ip1>-1566622307307 blk_1073743161_2337 URI file:/hadoop/dfs/data/current/BP-1682518946-<ip1>-1566622307307/current/rbw/blk_1073743161
2019-08-24 05:47:36,895 INFO datanode.DataNode: Receiving BP-1682518946-<ip1>-1566622307307:blk_1073743166_2342 src: /<ip3>:48396 dest: /<ip2>:9866
2019-08-24 05:47:36,897 ERROR datanode.DataNode: <hostname>-datanode:9866:DataXceiver error processing WRITE_BLOCK operation  src: /<ip3>:48396 dst: /<ip2>:9866
java.nio.channels.UnresolvedAddressException
    at sun.nio.ch.Net.checkAddress(Net.java:101)
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:786)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
    at java.lang.Thread.run(Thread.java:748)

Where <hostname>-datanode is the name of the datanode container, and the IPs are various container IPs.

I'm wondering if I'm missing some configuration variable that would let containers from other VMs connect to the namenode, or some other change that'd allow this system to be distributed correctly. I'm wondering if the system is expecting the containers to be named a certain way, for example.

anthr
  • 1,026
  • 4
  • 17
  • 34

0 Answers0