Difficulties in understanding hadoop OOM error messages

Question

I run MR tasks on my cluster. (hadoop 1.2.1)

My MR application first splits the input data into the multiple partitions (128^2 ~ 512^2) at the first Map/Reduce phase, then processes each of them in the second Map phase.

Since processing each partition requires quite large amount of memory, I increased the number of partitions (128^5 ~ 512^2). Now I encountered the following error messages:

Error #1
Job initialization failed: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.TreeSet.<init>(TreeSet.java:124) at 
org.apache.hadoop.mapred.TaskInProgress.<init>(TaskInProgress.java:105) at 
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:745) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3890) at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 


Error #2
Failure Info:Job initialization failed: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3664) at java.lang.StringBuffer.toString(StringBuffer.java:671) at 
org.apache.hadoop.fs.Path.toString(Path.java:252) at 
org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:75) at 
org.apache.hadoop.mapred.JobInProgress.createSplits(JobInProgress.java:834) at 
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:724) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3890) at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

It says that I need to increase the amount of memory for each map/reduce java worker.

I can't understand the root cause of the above errors since the OOM error message is not from my application code. It seems to me that it's from an internal engine source codes.

Is there any one which the error comes from? Thanks

Can you mention your cluster capacity and the size of file being processed? — shanmuga, Jan 30 '17 at 10:37
@shanmuga I have 32 machines with 32 GB memory. Each machine runs 3 mappers and 3 reducers. I set the maximum heap size of mappers and reducers to 4 GB. The input data size is about ~1.2 TB. Is it related with the above error messages? — syko, Jan 30 '17 at 15:39
How exactly does each machine run three mappers? The number of mappers is determined by the size of the input file and by the block size, so is your blocking size set to a third of a TB? If so, why? — OneCricketeer, Feb 03 '17 at 04:47
According to [this answer](http://stackoverflow.com/a/1393503) `GC overhead limit exceeded` happens when too much of the heap space is being used and garbage collector is not able to free up anything. — shanmuga, Apr 06 '17 at 05:30

Difficulties in understanding hadoop OOM error messages

0 Answers0