0

I have two tables. Table 1 is a large one, and Table 2 is a small one. I would like to extract data from Table 1 if values in Table1.column1 matches those in Table2.column1. Both Table 1 and Table 2 have column, column1. Here is my code.

select *
from Table1
where condition1
and condition2
and column1 in (select column1 from Table2)

Condition 1 and Condition 2 are meant to restrict the size of the table to be extracted. Not sure if this actually works. Then I got execution error, return code 1. I am on Hue platform.

EDIT

As suggested by @yammanuruarun, I tried the following code.

SELECT *
FROM
  (SELECT *
   FROM Table1
   WHERE condition1
     AND condition2) t1
INNER JOIN Table2 ON t1.column1 = t2.column1

Then, I got the following error.

Error while processing statement: FAILED: Execution Error, return code 2 from 

org.apache.hadoop.hive.ql.exec.tez.TezTask. Application 

application_1580875150091_97539 failed 2 times due to AM Container for 

appattempt_1580875150091_97539_000002 exited with exitCode: 255 Failing this 

attempt.Diagnostics: [2020-02-07 14:35:53.944]Exception from container-launch.

Container id: container_e1237_1580875150091_97539_02_000001 Exit code: 255

Exception message: Launch container failed Shell output: main : command provided 1

 main : run as user is hive main : requested yarn user is hive Getting exit code

 file... Creating script paths... Writing pid file... Writing to tmp file /disk-
11/hadoop/yarn/local/nmPrivate/application_1580875150091_97539/container_e1237_1580875150091_97539_02_000001/container_e1237_1580875150091_97539_02_000001.pid.tmp

Writing to cgroup task files... Creating local dirs... Launching container... 

Getting exit code file... Creating script paths... [2020-02-07 14:35:53.967]Container exited with a non-zero exit code 255. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : 

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in 

thread "IPC Server idle connection scanner for port 26888" Halting due to Out Of 

Memory Error... Halting due to Out Of Memory Error... Halting due to Out Of Memory 
Error... 

Halting due to Out Of Memory Error... Halting due to Out Of Memory Error... 

Halting due to Out Of Memory Error... Halting due to Out Of Memory Error... 

Halting due to Out Of Memory Error... [2020-02-07 14:35:53.967]Container exited

 with a non-zero exit code 255. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : 

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in 

thread "IPC Server idle connection scanner for port 26888" Halting due to Out Of Memory Error... Halting due to Out Of Memory Error... 

Halting due to Out Of Memory Error... Halting due to Out Of Memory Error...

 Halting due to Out Of Memory Error... Halting due to Out Of Memory Error... 

Halting due to Out Of Memory Error... Halting due to Out Of Memory Error... 

For more detailed output, check the application tracking page: http://dcwipphm12002.edc.nam.gm.com:8088/cluster/app/application_1580875150091_97539 Then click on links to logs of each attempt. . Failing the application.

Looks like it is a memory error. Is there any way I could optimize my query?

ycenycute
  • 688
  • 4
  • 10
  • 20
  • 1. If you have access can you provide or check the actual error message in the job tracker. You should be having the application number when you run the job. Just the return code of the job will not be too much helpful. You might also find the job tracker url from the log which gets pastedabove your return code error. 2. Apart from this, is it fine for you to go with join? select count(*) from Table1 t1 where condition1 and condition2 Inner Join Table2 t2 on t1.column1=t2.column1; Since you need only the records matching from both tables based on provided condition I used inner join. – yammanuruarun Feb 07 '20 at 18:16
  • @yammanuruaruny Thanks for the tips. Could you please advise on how to see the actual error message? I do have the application ID. Also, the code of inner join does not work. Looks like I am unable to do join after where conditions. – ycenycute Feb 07 '20 at 18:45
  • The query i gave in above comment is wrong. Where conditions should come after join. With proper alias both of these queries are working fine for me. select count(*) from Table1 t1 Inner Join Table2 t2 on t1.column1=t2.column1 where condition1 and condition2; select count(*) from Table1 t1 where condition1 and condition2 and t1.column1 in (select t2.column1 from Table2 t2); – yammanuruarun Feb 07 '20 at 19:44
  • Right above Query History/Results, the logs of the hive query job will be printed as shown in the image link referred as "Where to see result in text or chart". In this log you will find some job info along with job_id(something like job_124353131534), Kill command to kill this hadoop job, Tracking URL or the url to track the job etc. You will need to copy Tracking url and paste it in browser to see the map reduce logs and then look failed mapper/reducer log. https://static.packt-cdn.com/products/9781788995092/graphics/99270c5c-a497-4ba9-99bc-68048061a72f.png – yammanuruarun Feb 07 '20 at 19:45
  • @yammanuruarun Thanks. Table 1 is very large and receiving data all the time, I don't think we could join first then do the where conditions. I tried another to do the inner join (see the edited part in the question), but still got errors. – ycenycute Feb 07 '20 at 21:49
  • Looks like tez is used as the execution engine for hive which is out of my expertise. Whatever i suggested above was for yarn or map-reduce as execution engine. You may want to look the blogs for tez memory issues while running join to get better help!!! Thank You. https://community.cloudera.com/t5/Support-Questions/Hive-job-fail-on-TEZ-due-to-out-of-memory/td-p/120873 https://stackoverflow.com/questions/48484391/execute-hive-query-with-in-clause-parameters-in-parallel/48487306#48487306 https://stackoverflow.com/questions/55919060/issue-in-hive-query-due-to-memory – yammanuruarun Feb 08 '20 at 05:11

0 Answers0