1

I'm trying to join some customer applications (~560k records) to another servicing table (~4.6 Billion records) which contains multiple snapshots of customers. Keep in mind that the latter is partitioned by servicing date.

The aim is to join the servicing data as of the time of application, where the application date is equal to servicing date for each customer.

The way I've done my join relies on loading all partitions of the servicing table, but obviously this is a very expensive operation and causing my query to take a really long time to run. So would appreciate any help to optimize this join.

Below is the code I've tried:

SELECT * FROM
    applications apps
JOIN
    -- partitioned table
    (SELECT * FROM servicing WHERE serv_date > 0) serv
ON apps.customer_id = serv.customer_id
AND apps.app_date = serv.serv_date
rayven1lk
  • 133
  • 5
  • @rayven1lk-you may find some help here.. I have also joined a non partitioned table with partitioned table in hive. https://stackoverflow.com/questions/56939313/optimize-the-join-performance-with-hive-partition-table/56940800#56940800 – vikrant rana Jul 11 '19 at 05:20

0 Answers0