1

I'm using a Docker stack that implements, in the same machine, an Hadoop Namenode, two Datanodes, two Node Managers, a Resource Manager, a History Server, and other technologies.

I encountered an issue related to the HDFS Configured Capacity that is shown in the HDFS UI.

I'm using a machine with 256GB capacity, and I'm using the two datanodes implementation mentioned above. Instead of distributing the total capacity between the two nodes, HDFS duplicates the capacity of the entire machine by giving 226.87GB to each datanode. As you can see here.

Any thoughts on how to make HDFS show the right capacity?

Here is the portion of the docker compose that implements the hadoop technologies mentioned above.

    services:
  # Hadoop master
  namenode:
    image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
    container_name: namenode
    ports:
      - 9870:9870
      - 8020:8020
    volumes:
      - ./namenode/home/${ADMIN_NAME:?err}:/home/${ADMIN_NAME:?err}
      - ./namenode/hadoop-data:/hadoop-data
      - ./namenode/entrypoint.sh:/entrypoint.sh
      - hadoop-namenode:/hadoop/dfs/name
    env_file:
      - ./hadoop.env
      - .env
    networks:
      - hadoop

  resourcemanager:
    restart: always
    image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8
    container_name: resourcemanager
    ports:
      - 8088:8088
    environment:
      SERVICE_PRECONDITION: "namenode:9870 datanode1:9864"
    env_file:
      - ./hadoop.env
    networks:
      - hadoop

  # Hadoop slave 1
  datanode1:
    image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
    container_name: datanode1
    volumes:
      - hadoop-datanode-1:/hadoop/dfs/data
    environment:
      SERVICE_PRECONDITION: "namenode:9870"
    env_file:
      - ./hadoop.env
    networks:
      - hadoop

  nodemanager1:
    image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8
    container_name: nodemanager1
    volumes:
      - ./nodemanagers/entrypoint.sh:/entrypoint.sh
    environment:
      SERVICE_PRECONDITION: "namenode:9870 datanode1:9864 resourcemanager:8088"
    env_file:
      - ./hadoop.env
      - .env
    networks:
      - hadoop

  # Hadoop slave 2
  datanode2:
    image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
    container_name: datanode2
    volumes:
      - hadoop-datanode-2:/hadoop/dfs/data
    environment:
      SERVICE_PRECONDITION: "namenode:9870"
    env_file:
      - ./hadoop.env
    networks:
      - hadoop

  nodemanager2:
    image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8
    container_name: nodemanager2
    volumes:
      - ./nodemanagers/entrypoint.sh:/entrypoint.sh
    environment:
      SERVICE_PRECONDITION: "namenode:9870 datanode2:9864 resourcemanager:8088"
    env_file:
      - ./hadoop.env
      - .env
    networks:
      - hadoop

  historyserver:
    image: bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8
    container_name: historyserver
    ports:
      - 8188:8188
    environment:
      SERVICE_PRECONDITION: "namenode:9870 datanode1:9864 datanode2:9864 resourcemanager:8088"
    volumes:
      - hadoop-historyserver:/hadoop/yarn/timeline
    env_file:
      - ./hadoop.env
    networks:
      - hadoop

1 Answers1

0

You will need to create the docker volumes with a defined size that fits on your machine and then ask each DN to use that volume. Then when the DN inspects the size of its volumes, it should return the size of the volume rather than the capacity of your entire machine and use that for the capacity.

Stephen ODonnell
  • 4,441
  • 17
  • 19
  • Thank you for the reply. Unfortunately, I tried to edit my docker-compose in order to define a specific size to the volumes each DN is already using (hadoop-datanode-1 and hadoop-datanode-2) as you suggested, but with no success, can you show me a quick example of how to do it, I couldn't find it anywhere. – Diogo Rodrigues Nov 24 '21 at 10:16
  • Surprisingly (to me), it seems that setting a volume size in docker is not really possible, at least with the default storage driver. https://stackoverflow.com/questions/52089499/create-docker-volume-with-limited-size?noredirect=1&lq=1 for example, but further searching is not giving me anything else suggests it is possible. One option seem to be to use LVM and create a series of mounts on your host of your desired size, and then use them for the volumes. However that would reserve your space completely for each DN, and would make your setup less portable to other machines. – Stephen ODonnell Nov 25 '21 at 11:52