After shutting down the cluster ./stop-all.sh
, and then invoking a hadoop namenode -format
, I see that the datanodes have the same disk space i.e. the space has not been freed up.
Why is that?
After shutting down the cluster ./stop-all.sh
, and then invoking a hadoop namenode -format
, I see that the datanodes have the same disk space i.e. the space has not been freed up.
Why is that?
You can delete manually data on DataNode before formatting NameNode
rmr
Usage: hadoop fs -rmr URI [URI …]
Recursive version of delete. Example:
hadoop fs -rmr /user/hadoop/dir
hadoop fs -rmr hdfs://nn.example.com/user/hadoop/dir
Exit Code:
Returns 0 on success and -1 on error.
Alternatively
Data-nodes should be reformatted whenever the name-node is. I see 2 approaches here:
On formatting the namenode, the space does not get cleaned up. One will have to do so manually.
To do that,
First stop the cluster by invoking ./stop-all.sh
or ./stop-mapred.sh
and ./stop-dfs.sh
in the correct order.
Then delete the data directory of the datanode, i.e either the directory specified by dfs.data.dir
in hdfs-site.xml
or by hadoop.tmp.dir
/dfs/data
The option to do a -rmr
(specified in one of the other answers to this question) before doing a format is actually the best way, unless you're like me who unknowingly formatted the namenode and THEN realized that the datanode space doesn't get cleaned up ;)
Formatting a Namenode won't format the Datanode.
It will just format the contents of your namenode. i.e., Your namenode will no longer know where your data is. Also namenode -format will assign a new namespace ID to the namenode
You will have to change your namespaceID in your datanode to make your datanode work. This will be at dfs/data/current/VERSION
There is a JIRA open now for the same suggesting to format Datanode aswell when you format Namenode. HDFS-107