0

I am practicing a hadoop cluster via Raspberry Pi,, according to this tutorial(http://www.widriksson.com/raspberry-pi-hadoop-cluster/) he sets the node1 in his hadoop masters file configuration which is confusing as he also uses node to start the hadoop daemons. I want to know the reason for his configuration as well

P.S. - Just ctrl+f the masters

Dean Christian Armada
  • 6,724
  • 9
  • 67
  • 116
  • It's not ideal. With Hadoop 2.x, Active Namenode will take over Namenode role if Namenode is down. Have a look at related SE question : http://stackoverflow.com/questions/19970461/name-node-vs-secondary-name-node/34716750#34716750 – Ravindra babu Jan 27 '16 at 17:05

1 Answers1

1

No It is not ideal. It is up to you how to configure your cluster. In this tutorial, author decided to use node1 as P-NN and S-NN at the same time. Keep in mind that RPi Hadoop Cluster is suitable just for development and test and not production environment.

Pros and Cons of running Primary NameNode and Secondary NameNode on a separate machine(Based on This article from Cloudera):

1.Scalability. Creating the system snapshot requires about as much memory as the NameNode itself occupies. Since the memory available to the NameNode process is a primary limit on the size of the distributed filesystem, a large-scale cluster will require most or all of the available memory for the NameNode.

2.Durability. When the SecondaryNameNode creates a checkpoint, it does so in a separate copy of the filesystem metadata. Moving this process to another machine also creates a copy of the metadata file on an independent machine, increasing its durability.

Mobin Ranjbar
  • 1,320
  • 1
  • 14
  • 24