1

I would like to how does a block gets created. Does it create 64MB blocks by default on the filesystem or creates based on the file transfer activity

Assume i have setup a 10 node cluster setup. I am installing hadoop on all the nodes. how does block gets created now. Once i start the hdfs services, the block gets created on the linux systems. Does it created a physical 64 MB block of 4k blocks? (basic block size).

or

When I move a file of size 128 MB there will be two block created. Does the block creation happen parellely in two nodes? Which component will actually split the file into blocks?

I am just beginner of hadoop and hence asking these questions to get a clear understanding

Karthi
  • 708
  • 1
  • 19
  • 38
  • HDFS blocks are different from normal filesystem blocks. They are not based on file transfer activity, and blocks are the same regardless of how many nodes in the cluster. Each block also has a replication factor of 3, by default. There will be *at least 2* blocks used for a 128MB file because a file can span multiple blocks and blocks can contain partial files – OneCricketeer Mar 03 '16 at 05:30
  • Thanks. You are saying that HDFS blocks get created on top of a linux filesystem of(multiple 4k) once we install and execute HDFS services on all the nodes. Once the input file arrives, the system will split the file into 64MB(say) and place into a block. – Karthi Mar 03 '16 at 05:35
  • Correct, HDFS is a logical filesystem across the cluster, pooling together all the physical filesystems across the cluster. I believe it's Mapreduce that handles splitting the files and writing those splits to disk – OneCricketeer Mar 03 '16 at 05:42
  • Not map reduce i think. because even if I move a file to hdfs the file will be splitted into blocks. so hdfs services will do it. – Karthi Mar 03 '16 at 05:48
  • HDFS is only a file system. Mapreduce reads and writes to it. – OneCricketeer Mar 03 '16 at 05:49
  • For example if I issue a command hadoop fs -copyFromLocal source to destination. No map reduce invovles here. how do you think the files are getting created. – Karthi Mar 03 '16 at 05:54
  • 1
    Well, I can tell you the files aren't directly copied... The namenode is contacted, then a block is allocated and the file splits are "mapped across" and "reduced" into an HDFS datanode block... The whole process is far too broad for a StackOverflow post, though, so here is the first link I found. http://www.devinline.com/2015/03/read-and-write-operation-in-hadoop.html – OneCricketeer Mar 03 '16 at 06:03
  • Thanks! This is very clear. You can post this as answer and I will mark it. – Karthi Mar 03 '16 at 06:22
  • @Karthi: Have a look at this question to understand how write operation works: http://stackoverflow.com/questions/34464187/hadoop-file-write/34464676#34464676 – Ravindra babu Mar 07 '16 at 15:47

1 Answers1

1

This is the best matirial I ever found for HDFS beginner. It simply answner your questions via vivid comics.

A good client always knows these two things: BlockSize and Replication Factor

HDFS explained as comics
https://drive.google.com/file/d/0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1/view

Shawn Guo
  • 3,169
  • 3
  • 21
  • 28