I have the following system:
- Windows host
- Linux guest with Docker (in Virtual Box)
I have installed HDFS in Docker (Ubuntu, Virtual Box). I have used the bde2020 hadoop image from Docker Hub. This is my docker-compose:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
container_name: namenode
restart: always
ports:
- 9870:9870
- 9000:9000
volumes:
- hadoop_namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.20
datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
container_name: datanode
restart: always
ports:
- 9864:9864
volumes:
- hadoop_datanode:/hadoop/dfs/data
environment:
SERVICE_PRECONDITION: "namenode:9870"
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.21
resourcemanager:
image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8
container_name: resourcemanager
restart: always
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864"
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.22
nodemanager1:
image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8
container_name: nodemanager
restart: always
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.23
historyserver:
image: bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8
container_name: historyserver
restart: always
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
volumes:
- hadoop_historyserver:/hadoop/yarn/timeline
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.24
volumes:
hadoop_namenode:
hadoop_datanode:
hadoop_historyserver:
networks:
processing_net:
driver: bridge
ipam:
driver: default
config:
- subnet: 10.0.0.0/24
gateway: 10.0.0.1
My hdfs-site.xml is:
<configuration>
<property><name>dfs.namenode.datanode.registration.ip-hostname-check</name><value>false</value></property>
<property><name>dfs.webhdfs.enabled</name><value>true</value></property>
<property><name>dfs.permissions.enabled</name><value>false</value></property>
<property><name>dfs.namenode.name.dir</name><value>file:///hadoop/dfs/name</value></property>
<property><name>dfs.namenode.rpc-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.servicerpc-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.http-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.https-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.client.use.datanode.hostname</name><value>true</value></property>
<property><name>dfs.datanode.use.datanode.hostname</name><value>true</value></property>
</configuration>
If I write from Linux (inside Virtual Box) in the navigator:
then I can access to the Hadoop web ui.
And if I write in the navigator from Windows (host system, outside Virtual Box):
http://192.168.56.1:9870 then I can access too (I have mapped this IP to be able to connect from outside of Virtual Box).
But the problem arise when I am navigating in the web ui and I want to download a file. Then the navigator says it can't connect to the server dcfb0bf3b42c and shows in the address tab a line like this:
http://dcfb0bf3b42c:9864/webhdfs/v1/tmp/datalakes/myJsonTest1/part-00000-0009b521-b474-49e7-be20-40f5e8b3a7b4-c000.json?op=OPEN&namenoderpcaddress=namenode:9000&offset=0
If I change this part "dcfb0bf3b42c" to the IP: 10.0.1.21 (from Linux) or 192.168.56.1 (from Windows) it works correctly and donwload the file.
I need to automatize this process to avoid the need to write the IP by hand every time because I need to use a program to access HDFS data (Power BI) and when it tries to access the data fails because of the mentioned problem.
I'm new to Hadoop. Can I solve this problem by editing any configuration file?