2

I got this in output so I just want to know what is BP, Blk? Can you explain me what each thing means in this output? I know the

 BP-929597290-192.0.0.2-1439573305237:blk_1074084574_344316 len=2 repl=3 [DatanodeInfoWithStorage[192.0.0.9:1000,DS-730a75d3-046c-4254-990a-4eee9520424f,DISK], DatanodeInfoWithStorage[192.0.0.1:1000,DS-fc6ee5c7-e76b-4faa-b663-58a60240de4c,DISK], DatanodeInfoWithStorage[192.0.0.3:1000,DS-8ab81b26-309e-42d6-ae14-26eb88387cad,DISK]]

I guess 192.0.0.9:1000 this is the Ip of first replication of data

Naveen
  • 123
  • 3
  • 15

1 Answers1

6
  1. BP-929597290-192.0.0.2-1439573305237

    This is Block Pool ID. Block pool is a set of blocks that belong to single name space. For simplicity, you can say that all the blocks managed by a Name Node are under the same Block Pool.

    The Block Pool is formed as:

    String bpid = "BP-" + rand + "-"+ ip + "-" + Time.now();        
    
    Where: 
    rand = Some random number
    ip = IP address of the Name Node
    Time.now() - Current system time
    

    Read about Block Pools here: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html

  2. blk_1074084574_344316:

    Block number of the block. Each block in HDFS is given a unique identifier.

    The block ID is formed as:

    blk_<blockid>_<genstamp> 
    
    Where: 
    blockid = ID of the block
    genstamp = an incrementing integer that records the version of a particular block
    

    Read about generation stamp here: http://blog.cloudera.com/blog/2009/07/file-appends-in-hdfs/

  3. len=2

    Length of the block: Number of bytes in the block

  4. repl=3

    There are 3 replicas of this block

  5. DatanodeInfoWithStorage[192.0.0.9:1000,DS-730a75d3-046c-4254-990a-4eee9520424f,DISK

    Where:

    192.0.0.9 => IP address of the Data Node holding this block
    1000 => Data streaming port
    DS-730a75d3-046c-4254-990a-4eee9520424f => Storage ID. It is an internal ID of the Data Node. It is assigned, when the Data Node registers with Name Node
    DISK => storageType. It is DISK here. Storage type can be: RAM_DISK, SSD, DISK and ARCHIVE
    

The description of point 5 applies to remaining 2 blocks:

DatanodeInfoWithStorage[192.0.0.1:1000,DS-fc6ee5c7-e76b-4faa-b663-58a60240de4c,DISK], 
DatanodeInfoWithStorage[192.0.0.3:1000,DS-8ab81b26-309e-42d6-ae14-26eb88387cad,DISK]]
Manjunath Ballur
  • 6,287
  • 3
  • 37
  • 48
  • Manjunath Can you explain this in detail? 192.0.0.2-1439573305237:blk_1074084574_344316. So BP is the block pool where the information of data node blocks of that file is stored. 192.0.0.2 is the ip of the name node that gave the results, what does -1439573305237 and :blk_1074084574_344316 contain? – Naveen Dec 28 '15 at 17:24
  • I have clearly explained how BP ID is formed: String bpid = "BP-" + rand + "-"+ ip + "-" + Time.now(); Block Pool means "Pool of Blocks belonging to the same Name Node". All the blocks under a Name Node, will have the same Block Pool ID. I have explained about "blk_1074084574_344316" also. I hope, you read the entire post. – Manjunath Ballur Dec 28 '15 at 17:29
  • Thanks Manjunath, u said what they are. can you tell me what they are storing? the address of these 3 data node blocks is stored in BLK or Bp? – Naveen Dec 28 '15 at 17:40
  • I am sorry. I am not getting your question. BP ID is common for all the 3 blocks. The information for each block is contained in: DatanodeInfoWithStorage[]. That's why you see 3 instances of DatanodeInfoWithStorage[]. – Manjunath Ballur Dec 28 '15 at 17:44
  • I mean what does Blk store here? – Naveen Dec 28 '15 at 17:45
  • That is a block ID. I hope you understand concept of HDFS. A file is split into 'n' number of blocks. Each block is stored with a replication factor of 3 by default. "blk_" identifies a unique block ID for a particular block. This block is stored in 3 different data nodes. The information on where exactly each instance of this block is stored, is contained in "DatanodeInfoWithStorage" structure. – Manjunath Ballur Dec 28 '15 at 17:48
  • Thanks Manjunath. Now I got it – Naveen Dec 28 '15 at 17:53
  • how to find DS-fc6ee5c7-e76b-4faa-b663-58a60240de4c is belong to which disk ? if i know which disk it is, I can delete only certain file in that disk. – Plugie May 23 '19 at 04:12
  • Check this: https://stackoverflow.com/questions/6372060/how-to-track-which-data-block-is-in-which-data-node-in-hadoop – Manjunath Ballur May 27 '19 at 03:58