Thanks everyone for their input. I have done some research on the topic and share my findings.
- any static number is a magic number. I propose the number of block threshold to be: heap memory (in gb) x 1 million * comfort_%age (say 50%)
Why?
Rule of thumb: 1gb for 1M blocks, Cloudera [1]
The actual amount of heap memory required by namenode turns out to be much lower.
Heap needed = (number of blocks + inode (files + folders)) x object size (150-300 bytes [1])
For 1 million small files: heap needed = (1M + 1M) x 300b = 572mb <== much smaller than rule of thumb.
- High block count may indicate both.
namenode UI states the heap capacity used.
For example,
http://namenode:50070/dfshealth.html#tab-overview
9,847,555 files and directories, 6,827,152 blocks = 16,674,707 total filesystem object(s).
Heap Memory used 5.82 GB of 15.85 GB Heap Memory. Max Heap Memory is 15.85 GB.
** Note, the heap memory used is still higher than 16,674,707 objects x 300 bytes = 4.65gb
To find out small files, do
hdfs fsck -blocks | grep "Total blocks (validated):"
It would return something like:
Total blocks (validated): 2402 (avg. block size 325594 B) <== which is smaller than 1mb
- yes. a file is small if its size < dfs.blocksize.
- each file takes a new data block on disk, though the block size is close to file size. so small block.
- for every new file, inode type object is created (150B), so stress on heap memory of name node
Impact on name and data nodes:
Small files pose problems for both name node and data nodes:
name nodes:
- Pull the ceiling on number of files down as it needs to keep metadata for each file in memory
- Long time in restarting as it must read the metadata of every file from a cache on local disk
data nodes:
- large number of small files means a large amount of random disk IO. HDFS is designed for large files, and benefits from sequential reads.
[1] https://www.cloudera.com/documentation/enterprise/5-8-x/topics/admin_nn_memory_config.html