We are supporting an application which are running huge hive queries triggered via ETL tool.
The query after the mapping runs on hive. The query is very big but its structure looks like this.
INSERT INTO Table2
Select t1.f0, t1.f1,
infUDFCallDouble('TO_FLOAT', t1.f2) as f2,
SUBSTR(SUBSTR(toString(t1.f4, 'YYYY-MM-DD'), (CASE WHEN 0 < (-
LENGTH(toString(t1.f4, 'YYYY-MM-DD'))) THEN 0 ELSE 0 END), 10), (CASE WHEN 0 < (- LENGTH(SUBSTR(toString(t1.f4, 'YYYY-MM-DD'), (CASE WHEN 0 < (- LENGTH(toString(t1.f4, 'YYYY-MM-DD'))) THEN 0 ELSE 0 END), 10))) THEN 0 ELSE 0 END), 10) as f4,
f5, .....
FROM
(
Select t1.f0, t1.f1...
FROM
(
SELECT CAST(t1.cust_id AS STRING) as f0, ....
...
FROM sw.v_cust_info_pr t1
)
group by t1.f0, t1.f1
.....t1.f50 )
t1 ;
I surely don't think, they need to scan the same view multiple times Is there any simpler way to write this. Or surely multiple stage mapping can be merged, is my guess. apart from that, is there any other simpler way to approach the same. I am new and on support side, but this query runs for hours, so need to check if its running in the right way. i am sure one inner select can be discarded, anything else i can try.