2

I have the following system:

  • Windows host
  • Linux guest with Docker (in Virtual Box)

I have installed HDFS in Docker (Ubuntu, Virtual Box). I have used the bde2020 hadoop image from Docker Hub. This is my docker-compose:

namenode:
    image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
    container_name: namenode
    restart: always
    ports:
      - 9870:9870
      - 9000:9000
    volumes:
      - hadoop_namenode:/hadoop/dfs/name
    environment:
      - CLUSTER_NAME=test
    env_file:
      - ./hadoop.env
    networks: 
      control_net:
        ipv4_address: 10.0.1.20
  datanode:
    image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
    container_name: datanode
    restart: always
    ports:
      - 9864:9864
    volumes:
      - hadoop_datanode:/hadoop/dfs/data
    environment:
      SERVICE_PRECONDITION: "namenode:9870"
    env_file:
      - ./hadoop.env
    networks: 
      control_net:
        ipv4_address: 10.0.1.21
  resourcemanager:
    image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8
    container_name: resourcemanager
    restart: always
    environment:
      SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864"
    env_file:
      - ./hadoop.env
    networks: 
      control_net:
        ipv4_address: 10.0.1.22
  nodemanager1:
    image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8
    container_name: nodemanager
    restart: always
    environment:
      SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
    env_file:
      - ./hadoop.env
    networks: 
      control_net:
        ipv4_address: 10.0.1.23
  historyserver:
    image: bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8
    container_name: historyserver
    restart: always
    environment:
      SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
    volumes:
      - hadoop_historyserver:/hadoop/yarn/timeline
    env_file:
      - ./hadoop.env
    networks: 
      control_net:
        ipv4_address: 10.0.1.24        
volumes: 
  hadoop_namenode:
  hadoop_datanode:
  hadoop_historyserver:   
networks:
  processing_net:
    driver: bridge
    ipam:
      driver: default
      config:
        - subnet: 10.0.0.0/24
          gateway: 10.0.0.1

My hdfs-site.xml is:

<configuration>

<property><name>dfs.namenode.datanode.registration.ip-hostname-check</name><value>false</value></property>
<property><name>dfs.webhdfs.enabled</name><value>true</value></property>
<property><name>dfs.permissions.enabled</name><value>false</value></property>
<property><name>dfs.namenode.name.dir</name><value>file:///hadoop/dfs/name</value></property>
<property><name>dfs.namenode.rpc-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.servicerpc-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.http-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.https-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.client.use.datanode.hostname</name><value>true</value></property>
<property><name>dfs.datanode.use.datanode.hostname</name><value>true</value></property>
</configuration>

If I write from Linux (inside Virtual Box) in the navigator:

http://10.0.1.20:9870

then I can access to the Hadoop web ui.

And if I write in the navigator from Windows (host system, outside Virtual Box):

http://192.168.56.1:9870 then I can access too (I have mapped this IP to be able to connect from outside of Virtual Box).

But the problem arise when I am navigating in the web ui and I want to download a file. Then the navigator says it can't connect to the server dcfb0bf3b42c and shows in the address tab a line like this:

http://dcfb0bf3b42c:9864/webhdfs/v1/tmp/datalakes/myJsonTest1/part-00000-0009b521-b474-49e7-be20-40f5e8b3a7b4-c000.json?op=OPEN&namenoderpcaddress=namenode:9000&offset=0

If I change this part "dcfb0bf3b42c" to the IP: 10.0.1.21 (from Linux) or 192.168.56.1 (from Windows) it works correctly and donwload the file.

I need to automatize this process to avoid the need to write the IP by hand every time because I need to use a program to access HDFS data (Power BI) and when it tries to access the data fails because of the mentioned problem.

I'm new to Hadoop. Can I solve this problem by editing any configuration file?

David Zamora
  • 383
  • 1
  • 4
  • 15
  • can you try adding hostname option for each of the container/service you use, for example `hostname:historyserver `. – sathya Aug 08 '20 at 18:20
  • @smart_coder With those changes, its still not able to download the files and the address tab shows: http://datanode:9864/webhdfs/v1/tmp/datalakes/myJsonTest1/part-00000-0011a51e-c0af-4851-a2af-4ab8384a940d-c000.json? (...) I have also tried to write hostname: 'hereContainerIP' and still doesnt work obtaining in the address tab: http://0.0.0.10:9864/webhdfs/v1/tmp/datalakes/myJsonTest1/part- (...) (I dont know where it comes from that 0.0.0.10 since my containers ip are something like: 10.0.1.20, 10.0.1.21, ...etc) – David Zamora Aug 08 '20 at 18:54

2 Answers2

0

Finally, I found the solution to this problem.

The steps are:

1- Use de hostname tag in the docker-compose for all the services as @smart_coder suggested in a commentary:

hostname:datanode

2- Edit (in Linux) the /etc/hosts file and add the ip that routes to my service (in this case I needed to map the 'datanode' to its IP) so I added this line to the /etc/hosts file:

192.168.56.1 datanode

(which is a real IPv4 IP, if I add 10.0.1.21 which is a Docker IP created in my docker-compose also works in Linux but I'm not sure if it will work accessing from Windows). With this second step we are giving access to resolve the word 'datanode' as the IP 192.168.56.1 and this will work (only) inside my Linux guest.

But please, remember from my first commentary that I have mapped my windows IP (192.168.56.1) to my Docker (Linux) IP (10.0.1.21) so if in your case you are using only Linux, you can write your IP created in your Docker compose file and it will works.

3- Edit (in Windows) the /etc/hosts file by doing this steps:

  • Press Windows key
  • Write Notepad
  • Right click -> Run as administrator
  • From Notepad, open the file: C:\Windows\System32\Drivers\etc\hosts (c is my hard drive so the address can be different if your hard disk has another name).
  • I added:

192.168.56.1 datanode

  • Save

This third step allows to resolve the word 'datanode' as the IP 192.168.56.1 for the Windows host. And after this steps I am able to download the files accessing from my Linux guest (which is inside VirtualBox) and from my Windows host.

David Zamora
  • 383
  • 1
  • 4
  • 15
0

I've set hostname: localhost for a container based on cybermaggedon/hadoop:2.10.0 image. So NameNode returns localhost as DataNode host and I can access it without any issues:

  hadoop:
    image: cybermaggedon/hadoop:2.10.0
    hostname: localhost
    ports:
      - "50010:50010"
      - "50070:50070"
      - "50075:50075"
      - "50090:50090"
      - "9000:9000"
    volumes:
      - ./data/hadoop/data:/data

I don't think it's a right thing to do, but it works in my case, e.g. I can upload files via NameNode Web UI

UPD 2023-07-18 I don't have permissions to edit hosts file, so I've started using zeroconf to announce fake hostnames locally (e.g. "datanode") which point to 127.0.0.1. Works seamlessly. Almost. The only issue is that Chrome often fails to resolve address and I have to press F5.

Winand
  • 2,093
  • 3
  • 28
  • 48