SETUP - I have a large but simple query that usually takes a second or two to run
SELECT DISTINCT t1.Col_1, t1.Col_2, t1.Col_3, Col_4, ... t2.Col_3, t3.Col_2, ...
(14 columns total)
FROM Table_1 t1 INNER JOIN Table_2 t2 ON (t1.COL_3 = t2.COL_1) ...
(4 joins total, each on a different column in Table_1)
Table_1 is the primary table, and its Col_1 is the primay key for the entire database (all other tables just have a foreign key off a column in Table_1.)
PROBLEM - I need to find the largest possible count if I WHERE only on t1.Col_2 and t1.Col_4. As in, for existing combination of values in column 2 and column 4 of table 1, what is the largest result set I would get if I selected based on any of those combinations? For scale, there are over 200,000 unique combinations of the two columns.
I tried to avoid the problem by collecting all combinations of COL_2 and COL_4 and then query for counts using those combinations, but it took 30 minutes to test ~2,500. That's too slow.
What I'm NOT trying to answer:
- How many unique combinations of Col_2 and Col_4 exist. example and another example
- a small-scale version of the same problem. This and this are some of those.
MY BEST GUESS - working from some of the lower-ranked answers to similar questions, I think that
SELECT t1.Col_2, t1.Col_4, COUNT(DISTINCT t1.Col_1) AS total
FROM TABLE_1 t1 INNER JOIN Table_2 t2 ...
GROUP BY t1.Col_2, t1.Col_4
ORDER BY COUNT(DISTINCT t1.Col_1) DESC //not necessary but useful for trying to analzye the results
should give me what I'm looking for, but my uncertainty is high enough to justify putting it in the question instead of posting as a self-answer.