does hadoop not suffer the disk seeks as it sits on top of linux filesystem?

Question

I am new to Hadoop and i know HDFS is 64 mb (min) per block and can increase depending on the system. but as hdfs is installed on top of linux filesystem which is 4kb per block, does hadoop not suffer disk seek? also does hdfs interact with linux filesystem ?

The reason it is 4k is because of the page faulting mechanism, as pages are 4k big. Why do you think this increases seeks? The reason for big chunks like 64m is that you can read them sequentially from disk and not seek around. — Thomas Jungblut, Jun 12 '15 at 22:45
well, i know the reason why linux has 4kb blocks. my question is, as hadoop is installed on top of Linux and hadoop is 64mb per block so when hadoop's data is stored on disk, will it go through the linux filesystem. — Tumpiri Sydney Rockwell, Jun 14 '15 at 04:08

score 0 · Accepted Answer · edited May 23 '17 at 11:58

Your thinking is correct to certain extent but look at the bigger picture. When this 64 MB is stored on the Linux file system, it is distributed across many nodes. Consequently, if you want to read 3 blocks (each 4 KB), stored on 3 different Linux file systems (machines), the seek will be for only 1 seek and not 3 seeks as reading will be in parallel.

I think this might help: How are HDFS files getting stored on underlying OS filesystem?

does hadoop not suffer the disk seeks as it sits on top of linux filesystem?

1 Answers1