mySQL bringing back result it should not

Question

I have a table filled with tasting notes written by users, and another table that holds ratings that other users give to each tasting note.

The query that brings up all notes that are written by other users that you have not yet rated looks like this:

SELECT tastingNotes.userID, tastingNotes.beerID, tastingNotes.noteID, tastingNotes.note, COALESCE(sum(tasteNoteRate.Score), 0) as count, 
CASE 
WHEN tasteNoteRate.userVoting = 1162 THEN 1 
ELSE 0
END AS userScored
FROM  tastingNotes
left join tasteNoteRate on tastingNotes.noteID = tasteNoteRate.noteID
WHERE tastingNotes.userID != 1162 
Group BY tastingNotes.noteID
HAVING userScored < 1
ORDER BY count, userScored

User 1162 has written a note for note 113. In the tasteNoteRate table it shows up as:

noteID | userVoting | score
  113       1162        0

but it is still returned each time the above query is run....

I think that it may also be useful to present an explanation of what you are trying to achieve as well as this attempt that is not working as you expect. This particular approach may be completely misguided and there could be a simpler way to do what you are trying to do. — moreON, Mar 15 '17 at 03:56
Maybe you should exclude it as `WHERE tasteNoteRate.userVoting != 1162`? — Koshinae, Mar 15 '17 at 07:42
Looks like you need something like this: http://stackoverflow.com/questions/10968767/mysql-select-rows-that-do-not-have-matching-column-in-other-table — Paul Spiegel, Mar 15 '17 at 21:21

Solarflare · Answer 1 · 2017-03-15T15:11:17.283

MySQL allows you to use group by in a rather special way without complaining, see the documentation:

If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. [...] In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.

This behaviour was the default behaviour prior to MySQL 5.7.

In your case that means, if there is more than one row in tasteNoteRate for a specific noteID, so if anyone else has already voted for that note, userScored, which is using tasteNoteRate.userVoting without an aggregate function, will be based on a random row - likely the wrong one.

You can fix that by using an aggregate:

select ..., 
   max(CASE 
   WHEN tasteNoteRate.userVoting = 1162 THEN 1 
   ELSE 0
   END) AS userScored
from ...

or, because the result of a comparison (to something other than null) is either 1 or 0, you can also use a shorter version:

select ..., 
   coalesce(max(tasteNoteRate.userVoting = 1162),0) AS userScored
from ...

To be prepared for an upgrade to MySQL 5.7 (and enabled ONLY_FULL_GROUP_BY), you should also already group by all non-aggregate columns in your select-list: group by tastingNotes.userID, tastingNotes.beerID, tastingNotes.noteID, tastingNotes.note.

A different way of writing your query (amongst others) would be to do the grouping of tastingNoteRates in a subquery, so you don't have to group by all the columns of tastingNotes:

select tastingNotes.*, 
       coalesce(rates.count, 0) as count, 
       coalesce(rates.userScored,0) as userScored
from tastingNotes
left join (
  select tasteNoteRate.noteID,
         sum(tasteNoteRate.Score) as count,
         max(tasteNoteRate.userVoting = 1162) as userScored
  from tasteNoteRate
  group by tasteNoteRate.noteID
) rates 
on tastingNotes.noteID = rates.noteID and rates.userScored = 0
where tastingNotes.userID != 1162 
order by count;

This also allows you to get the notes the user voted on by changing rates.userScored = 0 in the on-clause to = 1 (or remove it to get both).

Thanks for the help and the great explanation! – Mike Mar 16 '17 at 05:22 — Mike, Mar 16 '17 at 05:22

score 0 · Answer 2 · edited May 23 '17 at 12:09

Change to an inner join.

The tasteNoteRate table is being left joined to the tastingNotes, which means that the full tastingNotes table (matching the where) is returned, and then expanded by the matching fields in the tasteNoteRate table. If tasteNoteRate is not satisfied, it doesn't prevent tastingNotes from returning the matched fields. The inner join will take the intersection.

See here for more explanation of the types of joins:

What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?

Make sure to create an index on noteID in both tables or this query and use case will quickly explode.

Note: Based on what you've written as the use case, I'm still not 100% certain that you want to join on noteID. As it is, it will try to give you a joined table on all the notes joined with all the ratings for all users ever. I think the CASE...END is just going to interfere with the query optimizer and turn it into a full scan + join. Why not just add another clause to the where..."and tasteNoteRate.userVoting = 1162"?

If these tables are not 1-1, as it looks like (given the sum() and "group by"), then you will be faced with an exploding problem with the current query. If every note can have 10 different ratings, and there are 10 notes, then there are 100 candidate result rows. If it grows to 1000 and 1000, you will run out of memory fast. Eliminating a few rows that the userID hasn't voted on will remove like what 10 rows from eventually 1,000,000+, and then sum and group them?

The other way you can do it is to reverse the left join:

select ...,sum()... from tasteNoteRate ... left join tastingNotes using (noteID) where userID != xxx group by noteID, that way you only get tastingNotes information for other users' notes.

Maybe that helps, maybe not, but yeah, SCHEMA and specific use cases/example data would be helpful.

With this kind of "ratings of ratings", sometimes its better to maintain a summary table of the vote totals and just track which the user has already voted on. e.g. Don't sum them all up in the select query. Instead, sum it up in the insert...on duplicate key update (total = total + 1); At least thats how I handle the problem in some user ranking tables. They just grow so big so fast.

mySQL bringing back result it should not

2 Answers2