2

SETUP - I have a large but simple query that usually takes a second or two to run

SELECT DISTINCT t1.Col_1, t1.Col_2, t1.Col_3, Col_4, ... t2.Col_3, t3.Col_2, ...
    (14 columns total)
FROM Table_1 t1 INNER JOIN Table_2 t2 ON (t1.COL_3 = t2.COL_1) ... 
    (4 joins total, each on a different column in Table_1)

Table_1 is the primary table, and its Col_1 is the primay key for the entire database (all other tables just have a foreign key off a column in Table_1.)

PROBLEM - I need to find the largest possible count if I WHERE only on t1.Col_2 and t1.Col_4. As in, for existing combination of values in column 2 and column 4 of table 1, what is the largest result set I would get if I selected based on any of those combinations? For scale, there are over 200,000 unique combinations of the two columns.

I tried to avoid the problem by collecting all combinations of COL_2 and COL_4 and then query for counts using those combinations, but it took 30 minutes to test ~2,500. That's too slow.

What I'm NOT trying to answer:

  • How many unique combinations of Col_2 and Col_4 exist. example and another example
  • a small-scale version of the same problem. This and this are some of those.

MY BEST GUESS - working from some of the lower-ranked answers to similar questions, I think that

SELECT t1.Col_2, t1.Col_4, COUNT(DISTINCT t1.Col_1) AS total 
FROM TABLE_1 t1 INNER JOIN Table_2 t2 ...
GROUP BY t1.Col_2, t1.Col_4
ORDER BY COUNT(DISTINCT t1.Col_1) DESC //not necessary but useful for trying to analzye the results

should give me what I'm looking for, but my uncertainty is high enough to justify putting it in the question instead of posting as a self-answer.

Community
  • 1
  • 1
Jeutnarg
  • 1,138
  • 1
  • 16
  • 28
  • What's your DBMS? Does it support RANK? – dnoeth Oct 29 '15 at 22:08
  • DBMS - whatever command line operations timesten gives you by default. And yes, RANK is a supported operation. The database is Oracle timeten version 11.2.2.6.4 – Jeutnarg Oct 30 '15 at 15:38
  • 2
    Your best guess is close, but you need `COUNT(*)` instead of `COUNT(DISTINCT t1.Col_1)`, it will be faster and the the count might be wrong with DISTINCT. And in your SETUP query the DISTINCT is probably not needed, too, this will cause a huge overhead if the optimizer can't remove it. – dnoeth Oct 31 '15 at 10:24
  • "all other tables just have a foreign key off a column in Table_1." - it is not clear what You mean. Does Table_2 have primary key on COL_1 and Table_1 foreign key on COL_3 to primary key of Table_2? – Antonín Lejsek Oct 31 '15 at 23:10
  • Every other table has been normalized directly off of table_1. So, every table besides table_1 has a foreign key constraint to one of table_1's columns and no 'independent' primary key that's used. It's more of a semantic distinction than a directly encoded distinction, but I thought it would help people who might get side-tracked on the other tables. – Jeutnarg Nov 01 '15 at 01:48
  • I think there is some confusion of therms. "A FOREIGN KEY in one table points to a PRIMARY KEY in another table." http://www.w3schools.com/sql/sql_foreignkey.asp Table_1 can have foreign key constraint pointing to primary key (or at least unique constraint) of Table_2, Table_3 and so on. Not the other way. If it were so, Col_2 and Col_4 columns of table_1 would both both have to be unique and Your problem would be trivial. – Antonín Lejsek Nov 01 '15 at 12:54

0 Answers0