What the command "hadoop namenode -format" will do

Question

I am trying to learn Hadoop by following a tutorial and trying to do pseudo-distributed mode on my machine.

My core-site.xml is:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
   <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9000</value>
      <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation.       
      </description>   
   </property>
</configuration>

My hdfs-site.xml file is:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
   <property>
      <name>dfs.replication</name>
      <value>1</value>
      <description>The actual number of replications can be specified when the
        file is created.
      </description>
   </property>
</configuration>

My mapred-site.xml file is:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
   <property>      
      <name>mapred.job.tracker</name>
      <value>localhost:9001</value>
      <description>The host and port that the MapReduce job tracker runs
        at.
      </description>
   </property>
</configuration>

When I run the command it ran successfully but what it is doing actually:

hadoop-1.2.1$ bin/hadoop namenode -format
14/11/26 12:37:16 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = myhost/127.0.0.8
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.6.0_45
************************************************************/
14/11/26 12:37:17 INFO util.GSet: Computing capacity for map BlocksMap
14/11/26 12:37:17 INFO util.GSet: VM type       = 64-bit
14/11/26 12:37:17 INFO util.GSet: 2.0% max memory = 932118528
14/11/26 12:37:17 INFO util.GSet: capacity      = 2^21 = 2097152 entries
14/11/26 12:37:17 INFO util.GSet: recommended=2097152, actual=2097152
14/11/26 12:37:17 INFO namenode.FSNamesystem: fsOwner=myuser
14/11/26 12:37:17 INFO namenode.FSNamesystem: supergroup=supergroup
14/11/26 12:37:17 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/11/26 12:37:17 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
14/11/26 12:37:17 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/11/26 12:37:17 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
14/11/26 12:37:17 INFO namenode.NameNode: Caching file names occuring more than 10 times 
14/11/26 12:37:17 INFO common.Storage: Image file /tmp/hadoop-myuser/dfs/name/current/fsimage of size 115 bytes saved in 0 seconds.
14/11/26 12:37:18 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-myuser/dfs/name/current/edits
14/11/26 12:37:18 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-myuser/dfs/name/current/edits
14/11/26 12:37:18 INFO common.Storage: Storage directory /tmp/hadoop-myuser/dfs/name has been successfully formatted.
14/11/26 12:37:18 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at chaitanya-OptiPlex-3010/127.0.0.8
************************************************************/

Can someone please let me know what it is doing internally.

I have gone through these posts but there is no correct explanation.

What exactly is hadoop namenode formatting?

hadoop namenode is not formatting

How can I check this practically on my machine so I can see the differences before and after running the command. I am new to Hadoop so this can be a trivial question.

http://stackoverflow.com/a/18873340/3496666 look at this answer. — Kumar, Nov 26 '14 at 09:36
@Kumar, I still have same question the OP has posted in comments, but question is how to check this practically on my machine to see the difference before and after running the command. I am new to Hadoop so wanted to know what this command does. — learner, Nov 26 '14 at 09:43
see my answer. formatting the namenode will not affect the datanode. Namenode creates a new namespace id. — Kumar, Nov 26 '14 at 09:46
Possible duplicate of [What exactly is hadoop namenode formatting?](https://stackoverflow.com/questions/18862875/what-exactly-is-hadoop-namenode-formatting) — hongsy, Nov 05 '17 at 20:27
I am getting the exact same error, can you please guide me what to do next? — MAULIK MODI, Jan 02 '19 at 19:09

score 20 · Answer 1 · edited Nov 26 '14 at 09:16

20

hadoop namenode -format this command deletes all files in your hdfs.

tmp directory contains two folders datanode, namenode in local filesystem. if you format the namenode these two folders becomes empty.

Note : if you want to format your namenode first stop all hadoop services then delete the tmp(contains namenode and datanode) folder in your local file system and start hadoop service surely it will take effect.

Reason for Hadoop namenode -format :

Hadoop NameNode is the centralized place of an HDFS file system which keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. In short, it keeps the metadata related to datanodes. When we format namenode it formats the meta-data related to data-nodes. By doing that, all the information on the datanodes are lost and they can be reused for new data.

By default the namenode location will be at "/tmp/hadoop-myuser/dfs/name"

While you formatting the namenode, this file location was cleared.

To change the namenode location add the follwing properties At hdfs-site.xml

<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/search/data/dfs/namenode</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/search/data/dfs/datanode</value>
</property>

I hope this will help you.. :-)

edited Nov 26 '14 at 09:16

ǨÅVËĔŊ RĀǞĴĄŅ

530
6
30

answered Nov 26 '14 at 08:42

Suresh Ram

1,034
3
16
40

1

@ǨÅVËĔŊ RĀǞĴĄN, Thanks Kaveen, for testing I have created a file called `test.txt` under `/tmp/hadoop-myuser/dfs/name` then I ran `stop-all.sh` then ran the command to format then tried to check if the `test.txt` got deleted but the file is still there. Can you please tell me why the file is still present even after formatting? I am a newbie to Hadoop so may be this is a trivial question. – learner Nov 26 '14 at 09:18
Did you check "text.txt" available in HDFS before stop all service. Check at http://localhost:50070 in your browser – ǨÅVËĔŊ RĀǞĴĄŅ Nov 26 '14 at 09:28
@ǨÅVËĔŊRĀǞĴĄŅ, No file is not present there when I opened 50070 port. Is it possible to check it practically how the formatting works? Can you please give me some steps. I am a newbie to Hadoop so this might be a trivial question. – learner Nov 26 '14 at 09:33
Hadoop only format the files under in the dfs.namenode.name.dir pointing to the location – Rengasamy Nov 26 '14 at 09:44

score 11 · Accepted Answer · answered Nov 26 '14 at 09:53

11

Hadoop namenode -format

Hadoop namenode directory contains the fsimage and edit files which holds the basic information's about hadoop file system such as where is data available, which user created files like that
If you format the namenode then the above information's are deleted from namenode directory which is specified in the hdfs-site.xml as dfs.namenode.name.dir
But you still have the datas on the hadoop but not namenode meta data

answered Nov 26 '14 at 09:53

Rengasamy

1,023
1
7
21

The explanation is good. But problem is I want to see the changes practically to make sure that the explanation is correct. Kaveen has given steps to see the differences but I am facing issue with `put` command, so posted another question - http://stackoverflow.com/questions/27147096/hadoop-put-command-throws-could-only-be-replicated-to-0-nodes-instead-of-1 – learner Nov 26 '14 at 10:53

Kumar · Answer 3 · 2017-05-03T04:16:08.443

Actually formatting a Namenode will not format the Datanode.

It will just format the contents of your namenode (which contains details of datanode). Your namenode will no longer know where your data is. Also namenode -format will assign a new namespace ID to the namenode

You have to change your namespaceID in your datanode to make your datanode work. This will be at dfs/data/current/VERSION

There is a JIRA open now for the same suggesting to format Datanode as well when you format Namenode. HDFS-107

score 3 · Answer 4 · answered Nov 26 '14 at 08:24

3

Namenode contains metadata about the Hadoop filesystem.

This command (hadoop-1.2.1$ bin/hadoop namenode -format) will format whole Hadoop distributed file system(HDFS). So if you run this command on existing filesystem you will lose all your data.

answered Nov 26 '14 at 08:24

Abhijeet Dhumal

1,799
13
24

can you please tell me where it exists on my machine and how can I see if the files are formatted or not? – learner Nov 26 '14 at 08:27
1

I am not sure if we can monitor namenode -format command. In hdfs-site.xml file we provide dfs directory location dfs.name.dir /usr/local/hadoop/dfs/name true So this location will get formatted. – Abhijeet Dhumal Nov 26 '14 at 09:12

score 0 · Answer 5 · edited May 23 '17 at 11:47

0

Steps start all the services using "start-all.sh"

check the services are running or not using "JPS" note: if you use hadoop2.3.0 then following services are need to run

Namenode
Datanode
Resourcemanager
Nodemanager

Move some file from local to HDFS using hdfs -put /

Now check at location "/tmp/hadoop-myuser/dfs/name" you may find this file split into some BLOCKS conatain 64 MB each.

Then start Formatting using **hadoop namenode -format** Now the file is not available phisically on that location

Further information click here

edited May 23 '17 at 11:47

Community

1
1

answered Nov 26 '14 at 09:50

ǨÅVËĔŊ RĀǞĴĄŅ

530
6
30

If you format the namenode mean It deleate all the BLOCK ID's present in that location. – ǨÅVËĔŊ RĀǞĴĄŅ Nov 26 '14 at 09:55
Thanks Kaveen, I am trying to follow the steps but getting issue. I have created this post with issue details, can you please check it once --> http://stackoverflow.com/questions/27147096/hadoop-put-command-throws-could-only-be-replicated-to-0-nodes-instead-of-1 – learner Nov 26 '14 at 10:46

What the command "hadoop namenode -format" will do

5 Answers5

Linked