2

I deployed kubernetes with flanneld.service enabled in coreos. And then I started hdfs namenode and datanode via kubernetes replication-controller. I also created kubernetes service for namenode. The namenode service ip is 10.100.220.223, while the pod ip of namenode is 10.20.96.4. In my case, one namenode and one datanode happens to be on same host. And namenode pod and datanode pod can ping each other successfully.

However I encountered the following two problems when trying to start hdfs datanode:

  1. If I used namenode service ip 10.100.220.223 as fs.defaultFS in core-site.xml for datanode. When datanode tried to register itself to namenode via rpc request, namenode got the wrong ip address for the datanode. Normally it should get the pod ip of the datanode, but in this case docker0 inet address of datanode host is reported to namenode.

  2. in order to workaround this, I used namenode pod ip 10.20.96.4 in core-site.xml for datanode. This time datanode can't be started at all. The error info reports that "k8s_POD-2fdae8b2_namenode-controller-keptk_default_55b8147c-881f-11e5-abad-02d07c9f6649_e41f815f.bridge" is used as namenode host instead of the namenode pod ip.

I tried to search this issue over the network, but nothing helps me. Could you please help me out of this? Thanks.

Abdulla Nilam
  • 36,589
  • 17
  • 64
  • 85
ztao1987
  • 71
  • 7

2 Answers2

3

use the latest kubernetes and pass the params --proxy-mode=iptables to kube-proxy start command, HDFS cluster works now

Matthew Murdoch
  • 30,874
  • 30
  • 96
  • 127
ztao1987
  • 71
  • 7
0

The issue is probably due to going through kube-proxy, which being a userspace proxy will cause all source IPs to be the same. Don't know if there's a way to specify the datanode IP through the application level protocol. If there's a way to provide it via command line argument or some xml config file, you could wrap the the binary in a shell script that will first grab the IP:

 IP=$(ip -4 -o addr show eth0 | grep -Po 'inet \K[\d.]+')

And then pass it via the argument or write it out to the config file before starting datanode binary.

See https://github.com/coreos/flannel/issues/363 and https://groups.google.com/forum/#!search/hdfs%2420flannel/google-containers/P4uh7y383oo/bPzIRaxhs5gJ for more info.