4

I will have 200 million files in my HDFS cluster, we know each file will occupy 150 bytes in NameNode memory, plus 3 blocks so there are total 600 bytes in NN. So I set my NN memory having 250GB to well handle 200 Million files. My question is that so big memory size of 250GB, will it cause too much pressure on GC ? Is it feasible that creating 250GB Memory for NN.

Can someone just say something, why no body answer??
Ani Menon
  • 27,209
  • 16
  • 105
  • 126
Jack
  • 5,540
  • 13
  • 65
  • 113
  • probably because fine tuning configuration has no right answer and requires a deep analysis of your cluster, also your question seems to be about GC and not what the actual title of the question implies which is misleading – fd8s0 Jun 13 '16 at 14:29

2 Answers2

2

You can have a physical memory of 256 GB in your namenode. If your data increase in huge volumes, consider hdfs federation. I assume you already have multi cores ( with or without hyperthreading) in the name node host. Guess the below link addresses your GC concerns: https://community.hortonworks.com/articles/14170/namenode-garbage-collection-configuration-best-pra.html

Marco99
  • 1,639
  • 1
  • 19
  • 32
2

Ideal name node memory size is about total space used by meta of the data + OS + size of daemons and 20-30% space for processing related data.

You should also consider the rate at which data comes in to your cluster. If you have data coming in at 1TB/day then you must consider a bigger memory drive or you would soon run out of memory.

Its always advised to have at least 20% memory free at any point of time. This would help towards avoiding the name node going into a full garbage collection. As Marco specified earlier you may refer NameNode Garbage Collection Configuration: Best Practices and Rationale for GC config.

In your case 256 looks good if you aren't going to get a lot of data and not going to do lots of operations on the existing data.

Refer: How to Plan Capacity for Hadoop Cluster?

Also refer: Select the Right Hardware for Your New Hadoop Cluster

Ani Menon
  • 27,209
  • 16
  • 105
  • 126