So I have 3 tables: Users, Purchases A, and Purchases B. I want to find all users that have made any purchase within the first 30 days of joining. So all users with a purchase from either Purchases A or Purchases B within the first 30 days after created_at.
A few things to note:
- The Users table has 200+ million rows
- Purchases A has billions of rows
- Purchases B has billions of rows
What is the best way to approach this with tables this large? I know I can accomplish this with 2 left joins, but I don't need all of the rows from the Purchases tables, I just need to know if any exist.
SELECT
user_id
,name
,created_at
FROM
users u
LEFT JOIN purchases_a pa
ON u.user_id = pa.user_id AND TO_DATE(pa.purchase_date) <= DATEADD(DAY, 30, u.created_at)
LEFT JOIN purchases_b pb
ON u.user_id = pb.user_id AND TO_DATE(pb.purchase_date) <= DATEADD(DAY, 30, u.created_at)
Is a join even the right approach here or would IN or EXISTS be more appropriate?