I recently came across this scenario where a MapReduce job seems to be successful in RM where as the PIG script returned with an exit code 8 which refers to "Throwable thrown (an unexpected exception)"
Added the script as requested:
REGISTER '$LIB_LOCATION/*.jar';
-- set number of reducers to 200
SET default_parallel $REDUCERS;
SET mapreduce.map.memory.mb 3072;
SET mapreduce.reduce.memory.mb 6144;
SET mapreduce.map.java.opts -Xmx2560m;
SET mapreduce.reduce.java.opts -Xmx5120m;
SET mapreduce.job.queuename dt_pat_merchant;
SET yarn.app.mapreduce.am.command-opts -Xmx5120m;
SET yarn.app.mapreduce.am.resource.mb 6144;
-- load data from EAP data catalog using given ($ENV = PROD)
data = LOAD 'eap-$ENV://event'
-- using a custom function
USING com.XXXXXX.pig.DataDumpLoadFunc
('{"startDate": "$START_DATE", "endDate" : "$END_DATE", "timeType" : "$TIME_TYPE", "fileStreamType":"$FILESTREAM_TYPE", "attributes": { "all": "true" } }', '$MAPPING_XML_FILE_PATH');
-- filter out null context entity records
filtered = FILTER data BY (attributes#'context_id' IS NOT NULL);
-- group data by session id
session_groups = GROUP filtered BY attributes#'context_id';
-- flatten events
flattened_events = FOREACH session_groups GENERATE FLATTEN(filtered);
-- remove the output directory if exists
RMF $OUTPUT_PATH;
-- store results in specified output location
STORE flattened_events INTO '$OUTPUT_PATH' USING com.XXXX.data.catalog.pig.EventStoreFunc();
And I can see "ERROR 2998: Unhandled internal error. GC overhead limit exceeded" in the pig logs.(log below)
Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.mapreduce.FileSystemCounter.values(FileSystemCounter.java:23)
at org.apache.hadoop.mapreduce.counters.FileSystemCounterGroup.findCounter(FileSystemCounterGroup.java:219)
at org.apache.hadoop.mapreduce.counters.FileSystemCounterGroup.findCounter(FileSystemCounterGroup.java:199)
at org.apache.hadoop.mapreduce.counters.FileSystemCounterGroup.findCounter(FileSystemCounterGroup.java:210)
at org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:241)
at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:370)
at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:391)
at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskReports(ClientServiceDelegate.java:451)
at org.apache.hadoop.mapred.YARNRunner.getTaskReports(YARNRunner.java:594)
at org.apache.hadoop.mapreduce.Job$3.run(Job.java:545)
at org.apache.hadoop.mapreduce.Job$3.run(Job.java:543)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.Job.getTaskReports(Job.java:543)
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getTaskReports(HadoopShims.java:235)
at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addMapReduceStatistics(MRJobStats.java:352)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:233)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:165)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:360)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:282)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1431)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1416)
at org.apache.pig.PigServer.execute(PigServer.java:1405)
at org.apache.pig.PigServer.executeBatch(PigServer.java:456)
at org.apache.pig.PigServer.executeBatch(PigServer.java:439)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:624)
Configuration in the pig script looks like below:
SET default_parallel 200;
SET mapreduce.map.memory.mb 3072;
SET mapreduce.reduce.memory.mb 6144;
SET mapreduce.map.java.opts -Xmx2560m;
SET mapreduce.reduce.java.opts -Xmx5120m;
SET mapreduce.job.queuename dt_pat_merchant;
SET yarn.app.mapreduce.am.command-opts -Xmx5120m;
SET yarn.app.mapreduce.am.resource.mb 6144;
Status of the Job in the RM of the Cluster says the job succeeded [can't post the image as my reputation is too low ;) ]
This issue occurs frequently and we have to restart the job the job successful.
Please let me know a fix for this.
PS: The cluster the job is running is one of the biggest in the world, so no worry with resources or the storage space I say.
Thanks