-1

I have 2 tables in hive. Table A has 300M rows and Table B has 26M rows. I am joining Table A and Table B on 3 columns col1,col2,col3.

Below is the query I am using

create temporary table AB_TEMP AS select A.col1,A.col2,A.col3,A.col4,A.col5 from A join B on A.col1=B.col1 and A.col2=B.col2 and A.col3=B.col3;

I am getting an error called vertex failure every time I run this query.

What to do to overcome this issue?

Below is the error that I am getting

Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1617665530644_1398582_10_01, diagnostics=[Task failed, taskId=task_1617665530644_1398582_10_01_000147, diagnostics=[TaskAttempt 0 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_0 Timed out after 300 secs], TaskAttempt 1 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_1 Timed out after 300 secs], TaskAttempt 2 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_2 Timed out after 300 secs], TaskAttempt 3 failed, info=[Container container_e42_1617665530644_1398582_01_002060 timed out]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:220, Vertex vertex_1617665530644_1398582_10_01 [Map 1] killed/failed due to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1617665530644_1398582_10_01, diagnostics=[Task failed, taskId=task_1617665530644_1398582_10_01_000147, diagnostics=[TaskAttempt 0 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_0 Timed out after 300 secs], TaskAttempt 1 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_1 Timed out after 300 secs], TaskAttempt 2 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_2 Timed out after 300 secs], TaskAttempt 3 failed, info=[Container container_e42_1617665530644_1398582_01_002060 timed out]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:220, Vertex vertex_1617665530644_1398582_10_01 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

KVHP
  • 33
  • 1
  • 7
  • Please provide exception message from failed container – leftjoin May 05 '21 at 11:41
  • If you need columns only from table A, consider using LEFT SEMI JOIN instead of INNER JOIN, it will not duplicate records if table B contains multiple matches. – leftjoin May 05 '21 at 11:45
  • I agree that we can use left semi join to avoid duplicates. This issue is not because of duplicates, it is because of the data size itself. I have added few parameters and ran it in map reduce and it worked – KVHP May 07 '21 at 11:48
  • I already asked you to provide failed container logs. Please do. Without logs it is impossible to fix – leftjoin May 07 '21 at 11:52
  • I have got the solution on how to achieve this and posted as an answer. Thanks for the help @leftjoin – KVHP May 07 '21 at 11:54
  • This is console log, not very informative and it seems there is no memory issue, at least in your log, it is "Timed out after 300 secs". Failed container log may contain more information. To fix timeout issue use these settings set mapreduce.task.timeout=1200000; set tez.session.client.timeout.secs=900; – leftjoin May 07 '21 at 15:35

2 Answers2

1

Don't execute this query on tez. We can complete this in Map Reduce.

set hive.execution.engine=mr;
set hive.auto.convert.join=false;
set mapreduce.map.memory.mb=2048;
set mapreduce.reduce.memory.mb=4096;

After setting all the above mentioned parameters, you can run the code and it executes fine

KVHP
  • 33
  • 1
  • 7
  • Hey! this is not a good solution. You can set memory for Tez as well – leftjoin May 07 '21 at 11:54
  • Can you tell me how to do that? – KVHP May 07 '21 at 11:55
  • Sure. Posted answer with example. You can try your figures – leftjoin May 07 '21 at 12:00
  • One needs to know what exactly failing to fix it. Question like this, without details is not useful for the community because there is no universal cure for all kind of failures. – leftjoin May 07 '21 at 12:03
  • Also you can increase parallelizm on mapper or reducer instead of increasin container size. There are many ways to fix it if you know what exactly failing – leftjoin May 07 '21 at 12:11
  • I have added the exact error that I am getting in the question – KVHP May 07 '21 at 13:25
  • @leftjoin As you have mentioned that we can increase parallelism on mapper or reducer instead of increasing container size. Can you tell me how to do this, as this may reduce the time taken by other queries that I use. – KVHP May 07 '21 at 13:26
  • Read here: https://stackoverflow.com/a/48487306/2700344 – leftjoin May 07 '21 at 15:39
1

Memory settings for Tez (figures are just an example):

--For AM
set tez.am.resource.memory.mb=8192;
set tez.am.java.opts=-Xmx6144m;
--Mapper and Reducer
set tez.reduce.memory.mb=6144;
set hive.tez.container.size=9216;
set hive.tez.java.opts=-Xmx6144m;
leftjoin
  • 36,950
  • 8
  • 57
  • 116