We have insert query in which we are trying to insert data to partitioned table by reading data from non partitioned table.
Query -
insert into db1.fact_table PARTITION(part_col1, part_col2)
( col1,
col2,
col3,
col4,
col5,
col6,
.
.
.
.
.
.
.
col32
LOAD_DT,
part_col1,
Part_col2 )
select
col1,
col2,
col3,
col4,
col5,
col6,
.
.
.
.
.
.
.
col32,
part_col1,
Part_col2
from db1.main_table WHERE col1=0;
Table has 34 columns, number of records in main table depends on size of input file which we receive on daily basis. and the number of partitions (part_col1, part_col2) which we insert in each run might vary from 4000 to 5000
Some time this query fails with below issue.
2019-04-28 13:23:31,715 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 177220.23 sec 2019-04-28 13:24:25,989 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 163577.82 sec MapReduce Total cumulative CPU time: 1 days 21 hours 26 minutes 17 seconds 820 msec Ended Job = job_1556004136988_155295 with errors Error during job, obtaining debugging information... Examining task ID: task_1556004136988_155295_m_000003 (and more) from job job_1556004136988_155295 Examining task ID: task_1556004136988_155295_m_000004 (and more) from job job_1556004136988_155295 Task with the most failures(4): ----- Task ID: task_1556004136988_155295_m_000000
----- Diagnostic Messages for this Task: Exception from container-launch. Container id: container_e81_1556004136988_155295_01_000015 Exit code: 255 Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:563) at org.apache.hadoop.util.Shell.run(Shell.java:460) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:748) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:305) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:356) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:88) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Shell output: main : command provided 1 main : user is bldadmin main : requested yarn user is bldadmin Container exited with a non-zero exit code 255 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 10 Cumulative CPU: 163577.82 sec MAPRFS Read: 0 MAPRFS Write: 0 FAIL Total MapReduce CPU Time Spent: 1 days 21 hours 26 minutes 17 seconds 820 msec
Current hive properties.
Using Tez Engine -
set hive.execution.engine=tez;
set hive.tez.container.size=3072;
set hive.tez.java.opts=-Xmx1640m;
set hive.vectorized.execution.enabled=false;
set hive.vectorized.execution.reduce.enabled=false;
set hive.enforce.bucketing=true;
set hive.exec.parallel=true;
set hive.auto.convert.join=false;
set hive.enforce.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.optimize.bucketmapjoin=true;
set hive.exec.tmp.maprfsvolume=false;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.stats.fetch.partition.stats=true;
set hive.support.concurrency=true;
set hive.exec.max.dynamic.partitions=999999999;
set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on=true;
Based on input from other teams we changed the engine to mr and propertied are -
set hive.execution.engine=mr;
set hive.auto.convert.join=false;
set mapreduce.map.memory.mb=16384;
set mapreduce.map.java.opts=-Xmx14745m;
set mapreduce.reduce.memory.mb=16384;
set mapreduce.reduce.java.opts=-Xmx14745m;
With these properties query completed with out any errors few times.
How can i debug these issue and are there any hive properties which we can set so that we don't get these issues in future.