We have this query in production which runs daily It does a lot of joins and also uses window function in hive
We tried to add few set options but that did not help much
Structure is something like this -
SELECT
C.f1, C.f2, A.f2 ...
FROM (
SELECT * FROM (
SELECT T1.*, B.atid, B.a_id,
ROW_NUMBER() OVER (PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC) AS RANK_
FROM T1 AS T1
JOIN T5 ON T1.t_dt = T5.t_dt
JOIN T2 B ON T1.wtid = B.wtid and T1.b_ts = B.b_ts
LEFT OUTER JOIN (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
ON T1.TYP = PV.p_cd
WHERE T1.state not in ("INVALID")
AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
AND ISNULL(PV.p_cd)
) T
WHERE T.rank_ = 1
) A
JOIN (SELECT *, row_number() over (partition by ac_id order by b_ts desc) rank_
FROM T4
WHERE event not in ('CT','UPD')
) AS C
ON A.a_id = C.a_id
AND A.atid = C.ac_id
AND C.rank_ = 1
JOIN T6 ON C.t_dt = T6.t_dt
- As i cannot ignore any tables ( and joins ), My approach was to substitute the window function with another join using aggregate function max but i was not able to rewrite it.
- Also i am not sure if that will surely help to improve performance so any guidance will help us.