0

Given the following data set, how would I find the email addresses that were references for the most ApplicationIDs that have an "Accepted" decision?

CREATE TABLE IF NOT EXISTS `EmailReferences` (
  `ApplicationID` INT NOT NULL,
  `Email` VARCHAR(45) NOT NULL,
  PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10@test.org'), (1, 'ref11@test.org'), (1, 'ref12@test.org'),
(2, 'ref20@test.org'), (2, 'ref21@test.org'), (2, 'ref22@test.org'),
(3, 'ref11@test.org'), (3, 'ref31@test.org'), (3, 'ref32@test.org'),
(4, 'ref40@test.org'), (4, 'ref41@test.org'), (4, 'ref42@test.org'),
(5, 'ref50@test.org'), (5, 'ref51@test.org'), (5, 'ref52@test.org'),
(6, 'ref60@test.org'), (6, 'ref11@test.org'), (6, 'ref62@test.org'),
(7, 'ref70@test.org'), (7, 'ref71@test.org'), (7, 'ref72@test.org'),
(8, 'ref10@test.org'), (8, 'ref81@test.org'), (8, 'ref82@test.org')
;

CREATE TABLE IF NOT EXISTS `FinalDecision` (
  `ApplicationID` INT NOT NULL,
  `Decision` ENUM('Accepted', 'Denied') NOT NULL,
  PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'),   (6, 'Denied'),
(7, 'Denied'),   (8, 'Accepted')
;

Fiddle of same:http://sqlfiddle.com/#!9/03bcf2/1

Initially, I was using LIMIT 1 and ORDER BY CountDecision DESC, like so:

SELECT  er.email, COUNT(fd.Decision) AS CountDecision
FROM    EmailReferences AS er
JOIN    FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE   fd.Decision = 'Accepted'
GROUP   BY er.email
ORDER   BY CountDecision DESC
LIMIT   1
;

However, it occurred to me that I could have multiple email addresses that referred different "most accepted" decisions (i.e., a tie, so to speak), and those would be filtered out (is that the right phrasing?) with the LIMIT keyword.

I then tried a variation on the above query, replacing the ORDER BY and LIMIT lines with:

HAVING MAX(CountDecision)

But I realized that that's only half a statement: MAX(CountDecision) needs to be compared to something. I just don't know what.

Any pointers would be much appreciated. Thanks!

Note: this is for a homework assignment.

Update: To be clear, I'm trying to find value and count of Emails from EmailReferences. However, I only want rows that have FinalDecision.Decision = 'Accepted' (on matching ApplicantIDs). Based on my data, the result should be:

Email          | CountDecision
---------------+--------------
ref10@test.org | 2
ref11@test.org | 2
Strawberry
  • 33,750
  • 13
  • 40
  • 57
Matthew Denaburg
  • 163
  • 1
  • 15
  • Possible duplicate of [SQL select only rows with max value on a column](https://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column) – Ben Dec 11 '17 at 04:33
  • @Ben That question looks like it only replies on data from one table. I'm trying to group data from one table (`EmailReferences`) but using matching a data that meets a criteria from another table (`FinalDecision`), joined on the `ApplicantID` columns. See my update. :) – Matthew Denaburg Dec 11 '17 at 13:43
  • Can you edit the question to contain only the correct, relevant information? – Strawberry Dec 11 '17 at 13:56
  • @Strawberry Done. Sorry, and thanks. – Matthew Denaburg Dec 11 '17 at 13:59
  • @Strawberry Sorry - I guess I misunderstood what you were asking. Thanks for the edit. :) – Matthew Denaburg Dec 11 '17 at 14:05

2 Answers2

0

MySQL still lack window functions, but when version 8 is production ready, this becomes easier. So for fuure reference, or for those databases like Mariadb that already have window functions:

CREATE TABLE IF NOT EXISTS `EmailReferences` (
  `ApplicationID` INT NOT NULL,
  `Email` VARCHAR(45) NOT NULL,
  PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10@test.org'), (1, 'ref11@test.org'), (1, 'ref12@test.org'),
(2, 'ref20@test.org'), (2, 'ref21@test.org'), (2, 'ref22@test.org'),
(3, 'ref30@test.org'), (3, 'ref31@test.org'), (3, 'ref32@test.org'),
(4, 'ref40@test.org'), (4, 'ref41@test.org'), (4, 'ref42@test.org'),
(5, 'ref50@test.org'), (5, 'ref51@test.org'), (5, 'ref52@test.org'),
(6, 'ref60@test.org'), (6, 'ref11@test.org'), (6, 'ref62@test.org'),
(7, 'ref70@test.org'), (7, 'ref71@test.org'), (7, 'ref72@test.org'),
(8, 'ref10@test.org'), (8, 'ref81@test.org'), (8, 'ref82@test.org')
;
 
CREATE TABLE IF NOT EXISTS `FinalDecision` (
  `ApplicationID` INT NOT NULL,
  `Decision` ENUM('Accepted', 'Denied') NOT NULL,
  PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'),   (6, 'Denied'),
(7, 'Denied'),   (8, 'Accepted')
;
select email, CountDecision
from (
     SELECT   er.email, COUNT(fd.Decision) AS CountDecision
            , max(COUNT(fd.Decision)) over() maxCountDecision
     FROM EmailReferences AS er
     JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
     WHERE    fd.Decision = 'Accepted'
     GROUP    BY er.email
     ) d
where CountDecision = maxCountDecision
email          | CountDecision
:------------- | ------------:
ref10@test.org |             2

dbfiddle here

Community
  • 1
  • 1
Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
0

For example...

SELECT a.*
  FROM 
     ( SELECT x.email
            , COUNT(*) total
         FROM emailreferences x
         JOIN finaldecision y
           ON y.applicationid = x.applicationid
        WHERE y.decision = 'accepted'
        GROUP
           BY x.email
     ) a
  JOIN
     ( SELECT COUNT(*) total
         FROM emailreferences x
         JOIN finaldecision y
           ON y.applicationid = x.applicationid
        WHERE y.decision = 'accepted'
        GROUP
           BY x.email
        ORDER 
           BY total DESC 
        LIMIT 1
     ) b
    ON b.total = a.total;
Strawberry
  • 33,750
  • 13
  • 40
  • 57