how HDFS allocates storage on disk

Question

I am very curious to to know how HDFS reserves and allocates storage space on a Data node.

Say if I have 500GB hard disk in one of the Data node, out of it 400GB is allocated for /data partition and hadoop was said to sit on it.

Here how much space out of 400GB will be taken by hadoop daemons for the storage of the input splits(blocks).

Does it reserve all the storage space priorly or does it allocate on demand?.

and also wanted to know whether formatting Name node while setting up the cluster has anything to do with this.

Many Thanks...

I don't see how pre-allocating all the disk space before writing anything to it sound useful at all. Why don't you try it out? — Thomas Jungblut, May 23 '14 at 07:12

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

The property dfs.datanode.data.dir Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.

Each block replica on a DataNode is represented by two files in the local native filesystem. The first file contains the data itself and the second file records the block's metadata including checksums for the data and the generation stamp. The size of the data file equals the actual length of the block and does not require extra space to round it up to the nominal block size as in traditional filesystems. Thus, if a block is half full it needs only half of the space of the full block on the local drive.

The name node format step is explained in this link can you refer this Link

how HDFS allocates storage on disk

1 Answers1