14

How do I wipe out the DFS in Hadoop?

biznez
  • 3,911
  • 11
  • 33
  • 37

7 Answers7

16

You need to do two things:

  1. Delete the main hadoop storage directory from every node. This directory is defined by the hadoop.tmp.dir property in your hdfs-site.xml.

  2. Reformat the namenode:

hadoop namenode -format

If you only do (2), it will only remove the metadata stored by the namenode, but won't get rid of all the temporary storage and datanode blocks.

romedius
  • 775
  • 6
  • 20
Eduard
  • 3,482
  • 2
  • 27
  • 45
  • deleting main hadoop storage directory from every single node is not feasible! – Mehraban Dec 10 '13 at 11:04
  • performing namenode -format will delete all the metadata and also makes your cluster unusable. This is not a advisable option. – Karthik Apr 21 '15 at 02:45
  • Also if a namenode -format will generate new cluster id for the namenode and all the other deamons will not be able to communicate with the namenode. Please update your answer to avoid misguidance. Thanks – Karthik Apr 21 '15 at 02:49
10
hdfs dfs -rm -r "/*"

(the old answer was deprecated)

Jonathan Graehl
  • 9,182
  • 36
  • 40
10
bin/hadoop namenode -format
SquareCog
  • 19,421
  • 8
  • 49
  • 63
  • 3
    Watchout: existing old datanodes won't work with this newly formatted dfs. See http://issues.apache.org/jira/browse/HDFS-107 – Leonidas Jan 25 '10 at 12:31
3

You may issue

hadoop fs -rmr /

This would delete all directories and sub-directories under DFS.

Another option is to stop your cluster and then issue:

hadoop namenode -format

This would erase all contents on DFS, and then start the cluster again.

techlad
  • 103
  • 1
  • 1
  • 5
3

So this is what I have had to do in the past.

1. Navigate to your hadoop directory on your NameNode, then stop all the hadoop processes. By running the default stop-all script. This will also stop DFS. e.g.

cd myhadoopdirectory
bin/stop-all.sh

2. Now On every machine in your cluster (Namenodes, JobTrackers, datanodes etc.) delete all files in your main hadoop storage mine is set to the temp folder in the root folder. Yours can be found in the conf hdfs-site.xml file under hadoop.tmp.dir property e.g.

cd /temp/
rm -r *

3. Finally go back to your name node, and format it by going to your hadoop directory and running 'bin/hadoop namenode -format' e.g.

cd myhadoopdirectory
bin/hadoop namenode -format

4. Start up your cluster again by running the following command. It will also startup DFS again.

bin/start-all.sh

5. And it should work.

jonhurlock
  • 1,798
  • 1
  • 18
  • 28
1
  1. You need to call bin/stop-all.sh to stop dfs and mapreduce.
  2. Delete data dir which is configured in conf/hdfs-site.xml and conf/mapred-site.xml.
  3. Make sure that you have deleted some temporary files existing in /tmp dir.

After all above steps, you can call bin/hadoop namenode -format to regenerate a dfs.

Charles Menguy
  • 40,830
  • 17
  • 95
  • 117
1
  1. Stop you cluster

    ${HADOOP_HOME}/bin/stop-mapred.sh

    ${HADOOP_HOME}/bin/stop-dfs.sh

    or if its pseudo distributed, simply issue:

    ${HADOOP_HOME}/bin/stop-all.sh

  2. Format your hdfs

    hadoop namenode -format

stholy
  • 322
  • 1
  • 7
  • 12