I run MR tasks on my cluster. (hadoop 1.2.1)
My MR application first splits the input data into the multiple partitions (128^2 ~ 512^2) at the first Map/Reduce phase, then processes each of them in the second Map phase.
Since processing each partition requires quite large amount of memory, I increased the number of partitions (128^5 ~ 512^2). Now I encountered the following error messages:
Error #1
Job initialization failed: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.TreeSet.<init>(TreeSet.java:124) at
org.apache.hadoop.mapred.TaskInProgress.<init>(TaskInProgress.java:105) at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:745) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3890) at
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Error #2
Failure Info:Job initialization failed: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3664) at java.lang.StringBuffer.toString(StringBuffer.java:671) at
org.apache.hadoop.fs.Path.toString(Path.java:252) at
org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:75) at
org.apache.hadoop.mapred.JobInProgress.createSplits(JobInProgress.java:834) at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:724) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3890) at
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
It says that I need to increase the amount of memory for each map/reduce java worker.
I can't understand the root cause of the above errors since the OOM error message is not from my application code. It seems to me that it's from an internal engine source codes.
Is there any one which the error comes from? Thanks