34

What exactly is involved in namenode formatting. If I type in the following command into my terminal within my hadoop installation folder:

  bin/hadoop namenode -format

What exactly does it accomplish? I am looking to understand principles of namenode formatting & its significance. Thanks...

Ace
  • 1,501
  • 4
  • 30
  • 49

3 Answers3

14

Hadoop NameNode is the centralized place of an HDFS file system which keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. In short, it keeps the metadata related to datanodes. When we format namenode it formats the meta-data related to data-nodes. By doing that, all the information on the datanodes are lost and they can be reused for new data.

techvineet
  • 5,041
  • 2
  • 30
  • 28
  • Thanks for explaining what NameNode & DataNode is and what they do. I have setup hadoop to operate in pseudo-distriuted mode on my local ubuntu installation. I was looking to explore the exact mechanics of formatting. Like what data is changed? How those changes are propagated to data-nodes, protocols involved etc.... I know it's all too big to fit into one answer. I was just looking get a nice summary. But thanks for your answer though..... – Ace Sep 18 '13 at 14:13
  • 1
    @techvineet You mentioned information on the datanodes are lost. When we format Namenode, why should data on Datanode be lost ? – Vinod Jayachandran Sep 22 '15 at 06:38
11

hadoop namenode -format formats your file system at the location specified in hdfs-site.xml

here my namenode directory is /usr/local/hadoop/dfs/name

<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
<final>true</final>
</property>
hongsy
  • 1,498
  • 1
  • 27
  • 39
Alkesh_IT
  • 376
  • 1
  • 9
2

Simply say,

  • NameNode contains metadata about data stored in DataNodes;
  • If NameNode is formatted, metadata alone gets deleted.
  • Original data in DataNode will not get affected;
Dinesh Kumar P
  • 1,128
  • 2
  • 18
  • 32
  • 8
    Yes But "Original data in DataNode" are orphan data and useless as they don't get reference from the name node. – Thanga Jan 06 '16 at 14:42
  • For posterity, the distinction is not academic. If you backup your namenode metadata, it matters much that the data *may* not be gone from the datanode filesystems. Also, if your datanodes are not secure *enough for your security needs*, the fact that orphaned data may be leaked because it hasn't been adequately scrubbed from disk may matter a lot. – jennykwan Dec 24 '18 at 01:25