0

iam working on hadoop apache 2.7.1

and iam adding files of size that doesn't exceed 100 Kb

so if i configure block size to be 1 mb or to be the default value which is 128 mb

that will not affect my files because they will be saved on one block only

and one block will be retrieved when we download the file

but what will be the difference in block storage size i mean does storing files on 1 mb block size differs from storing them on 128 mb block size when files are smaller than 1 mb

i mean when file of 1 mb is stored in a block of size 128 m will it reserve this whole block and this block is not going to be used for other files ,or empty space is going to be used for other files with a pointer refer to file start location in a block

i found no difference in uploading and downloading time is there any other points that i have to consider

oula alshiekh
  • 843
  • 5
  • 14
  • 40
  • What is this question even about? – philantrovert May 10 '17 at 13:02
  • i mentioned that when we can save our files on one block is is better to chose this block size as 1 mb (which is the greater file we will have) or to store it on default block size value which is 128 mb – oula alshiekh May 10 '17 at 13:49
  • http://stackoverflow.com/questions/19473772/data-block-size-in-hdfs-why-64mb talks about why the Hadoop block size defaults to large - if you reduce it you will likely run into issues with the NameNode not being able to hold all the metadata information in RAM for your total disk storage. It might be worth reconsidering your approach if you are looking to store a vast number of really small files, as that doesn't seem to be what Hadoop was designed for. – mc110 May 10 '17 at 14:32
  • i mean when file of 1 mb is stored in a block of size 128 m will it reserve this whole block and this block is not going to be used for other files ,or empty space is going to be used for other files with a pointer refer to file start location in a block – oula alshiekh May 11 '17 at 07:25

1 Answers1

0

I am going to cite the (now discontinued) SO documentation for this, written by me, because why not.

Say for example you have a file of size 1024 MBs. if your block size is 128 MB, you will get 8 blocks of 128MB each. This means that your namenode will need to store metadata of 8 x 3 = 24 files (3 being the replication factor).

Consider the same scenario with a block size of 4 KBs. It will result in 1GB / 4KB = 250000 blocks and that will require the namenode to save the metadata for 750000 blocks for just a 1GB file. Since all these metadata related information is stored in-memory, larger block size is preferred to save that bit of extra load on the NameNode.

Community
  • 1
  • 1
philantrovert
  • 9,904
  • 3
  • 37
  • 61