Perfomance improvement for hive query

Asked Mar 28 '17 at 13:26

Active Sep 16 '17 at 07:23

Viewed 54 times

I am using multiple union all and then doing the sum of each column, but this query runs like forever. I have 96GB memory cluster. Please tell me what should i do for performance improvement. Following is my query in hive.

total as
(
select * from
(
   select * from table1
   union all
   select * from table2
   union all
   select * from table3
   union all
   select * from table4
   union all
   select * from table5
   union all
   select * from table6
   union all
   select * from table7
   union all
   select * from table8
   union all
   select * from table9
)p
)


Select * from
(
select
sum(col_1),
sum(col_2),
sum(col_3),
sum(col_4),
sum(col_5),
sum(col_6),
sum(col_7),
sum(col_8),
sum(col_9),
sum(col_10)
from total
)q;

edited Sep 22 '17 at 17:48

Community

asked Mar 28 '17 at 13:26

Nikki

The second query doesn't need a select star... And hive is slow anyway, so why not use Spark? Or are you using Tez engine? – OneCricketeer Mar 28 '17 at 13:33
I am using Tez engine. Still it is running like forever. – Nikki Mar 28 '17 at 13:46
try to increase parallelism (increase number of mappers and reducers), see here: http://stackoverflow.com/a/42842117/2700344 – leftjoin Mar 28 '17 at 14:44
3

Why merging then aggregating, instead of pre-aggregating *and* merging *and* post-aggregating in one shot -- i.e. `select sum(xcol1), ... from (select sum(col1) as xcol1, ... from T union all ...) x` – Samson Scharfrichter Mar 28 '17 at 15:15

Perfomance improvement for hive query

0 Answers0