4

my apologies for asking what must be very simple to solve, but I just can't seem to wrap my mind around this.. I couldn't even come up with a really fitting title for my question, beg your pardons for that as well.

I have a poll where each user may post multiple answers to a question and others then vote on these answers. I need to get a result where the highest-voted answer of every user gets returned.

Test case: let's assume a question like "What is your most favourite song quote?"

CREATE TABLE `answers` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`authorId` INT,
`answer` TEXT NOT NULL ,
`votes` INT NOT NULL 
) ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO `answers` VALUES
(1, 30, "The West is the Best", 120),
(2, 30, "Come on, baby, light my fire", 100),
(3, 31, "Everything's gonna be allright", 350),
(4, 31, "Sayin' oooh, I love you", 350),
(5, 31, "Singing sweet songs of melodies pure and true", 290),
(6, 32, "I'm your pole and all you're wearing is your shoes", 540),
(7, 32, "And I'm crazier when I'm next to her", 180),
(8, 32, "You hear the music in the air", 230),
(9, 30, "You know they are a liar", 190)

The result I expect to get is:

id | authorId | answer                                             | votes
 6 |       32 | I'm your pole and all you're wearing is your shoes | 540
 3 |       31 | Everything's gonna be allright                     | 350
 9 |       30 | You know they are a liar                           | 190

Basically I need to select the best answer for each author and then sort the result by the votes the best answers got. It may happen that two answers by the same author have the same number of votes; then only the first one posted (lower ID) should be selected (as demonstrated with answers #3 and #4). Two different answers by the same author may never appear in the result - each author may only win once.

I've searched and searched and tried and tried again and at the moment I feel pretty brainwashed.. it may be that this is not doable in a single SQL query; should that be the case, it might be worth pointing out that the application is written in PHP. I know I could just grab all the answers with ORDER BY votes desc, id asc and then iterate over the result, remember all the authorIds and chuck out any row with an authorId I've already seen, but I need to obtain a set number of records and that could get awkward (... would possibly need running the query again with an offset if I chuck out too many rows etc).. but in the end it might be the best solution if the single-query solution was overly complex or none at all...

Any ideas? :o)

yoozer8
  • 7,361
  • 7
  • 58
  • 93
Dan Kadera
  • 109
  • 1
  • 2
  • 8

4 Answers4

5
SELECT id, authorId, answer, votes
FROM (  SELECT id, authorId, answer, votes
        FROM answers
        ORDER BY votes DESC) AS h
GROUP BY authorId

This little neat trick is built basing on GROUP BY to retrieve the first row of each case. Usually this is by default ORDER BY id ASC, but through this sub query, the first row in each authorId with the highest votes.

Note: As mentioned by Iain Elder, this solution doesn't work with ONLY_FULL_GROUP_BY active and only works in MySQL. This solution is to a certain extent unsupported due to lack of documentation confirming this behavior. It works well for me and has always worked well for me however.

This method still works on the latest MySQL on sqlfiddle.

Robin Castlin
  • 10,956
  • 1
  • 28
  • 44
  • +1 for sharing that trick. Blew my mind, which is SQL-Server-trained. Does the GROUP BY clause always yield the values of the first row for selected columns that are in the grouping criteria, or is this some special case using the ORDER BY clause? – Iain Samuel McLean Elder Sep 10 '12 at 23:55
  • This is due to the `ORDER BY`. Otherwise it'd `GROUP BY` the table's default order, which often is sorted by `PRIMARY`. Basicly without the subquery, you'd get the data of lowest id row instead of highest votes row. – Robin Castlin Sep 11 '12 at 07:01
  • Thanks a lot for teaching me (and perhaps some others) this about GROUP BY statements! I always thought that selecting columns that don't appear in the GROUP BY clause is only useful if the values of the column are equal for all grouped rows and otherwise the resulting value is unpredictible.. I even think I must've read it in a MySQL manual page. So thanks for clarifying that to me :o) – Dan Kadera Sep 11 '12 at 21:18
  • Actually, upon reflection I'm accepting this as the solution to my question and I believe it also works in the situation I outlined in a comment to bluefeet's answer - if I need to eliminate the duplicates (same author, same votes) by a different criteria than lowest ID, I'd just add them to the ORDER clause in the subselect (as in `... ORDER BY votes DESC, date_posted ASC`), is that about right? :o) – Dan Kadera Sep 11 '12 at 21:45
  • @DanKadera This query will fail if the connection's sql_mode includes [ONLY_FULL_GROUP_BY](http://dev.mysql.com/doc/refman/5.5/en/server-sql-mode.html#sqlmode_only_full_group_by). Oracle and SQL Server enforce this behavior, but it's optional in MySQL. [Ike Walker](http://mechanics.flite.com/blog/2013/02/12/why-i-use-only-full-group-by/) uses the mode to avoid unintentional errors. [Roland Bourman](http://rpbouman.blogspot.co.uk/2007/05/debunking-group-by-myths.html) explains that MySQL's behavior is ANSI compliant if the non-group-by columns are functionally dependent (here they are not). – Iain Samuel McLean Elder Jun 10 '13 at 21:19
  • Thanks for your input Iain. I've updated my answer to notify about these points. – Robin Castlin Jun 11 '13 at 07:57
3

You can use a sub-select:

select min(a1.id), a1.authorid, a2.mxvotes
from answers a1
inner join
(
  select authorid, max(votes) mxvotes
  from answers
  group by authorid
) a2
  on a1.authorid = a2.authorid
  and a1.votes = a2.mxvotes
group by a1.authorid, a2.mxvotes
order by mxvotes desc

see SQL Fiddle with Demo

Taryn
  • 242,637
  • 56
  • 362
  • 405
  • Thanks a lot, that should do it. But if I may expand on my original question: what if there was a requirement to choose between multiple rows from the same author with the same number of votes based on a different criteria than the lowest ID, say a date column? Then I wouldn't get the IDs of the answers I want, but instead the dates when they were posted. With the IDs, getting other related data for the best answers is easy, but how would I go about doing the same with the dates they were posted? Obviously there could be duplicates then, dates not being necessarily unique. Thanks a lot though! – Dan Kadera Sep 11 '12 at 21:33
  • @DanKadera you would have to post a new question with the other details and more sample data, it would be guesswork based on the description above. Seeing sample data and the final expected result is very helpful in figuring out these issues. – Taryn Sep 11 '12 at 21:36
1

Great question, Dan.

MySQL lacks the analytical functions to make this easy to solve. A similar question has been asked of Oracle and was solved using the OVER clause with the MAX function. This works on SQL Server too.

You need to use subqueries to do it on MySQL. This works for me:

SELECT
  id,
  authorId,
  answer,
  votes
FROM answers AS firsts
WHERE id = (
  SELECT
    MIN(id)
  FROM answers AS favorites
  WHERE
    votes = (
      SELECT MAX(votes)
      FROM answers AS author_max
      WHERE author_max.authorId = favorites.authorID
    ) AND
    favorites.authorId = firsts.authorId 
)
ORDER BY votes DESC;

See my sqlfiddle for an executable example.

Community
  • 1
  • 1
Iain Samuel McLean Elder
  • 19,791
  • 12
  • 64
  • 80
  • I was surprised to discover that MySQL does not support the OVER clause or common table expressions while researching this answer. I believe these missing language features make analytical queries like this one harder to read. – Iain Samuel McLean Elder Sep 10 '12 at 20:54
  • Thanks, isme, this might be even easier to grasp than bluefeet's solution (I'm still largely at war with GROUP BY statements), but I believe it has the same problem that bluefeet's answer has (see my comment there). – Dan Kadera Sep 11 '12 at 21:38
0
select * from (select * from answers order by votes desc) as temp group by authorId
Fathah Rehman P
  • 8,401
  • 4
  • 40
  • 42