0

I am trying to number rows from a query so I know which page it will appear on.

My expectation is that the order of the rows returned by the query is the same order they were numbered in--that is, that row_num will nicely increment in the result set.

However, the GROUP BY which is causing 2-3 rows to end up out of order (with the GROUP BY statements removed all numbering is in order).

What happens with GROUP BY that causes the problem and how do I get row numbering to apply AFTER all the grouping and sorting is done?

SELECT g2.*, @rownum := @rownum + 1 AS row_num
    FROM (SELECT g.gemid as gemid, g.tab, rt.sum_rating as sum_rating
            FROM gems g
            LEFT JOIN (
                SELECT gemid, SUM(rating) as sum_rating 
                    FROM gemrating GROUP BY gemid) rt ON g.gemid = rt.gemid
            WHERE g.tab=0 GROUP BY g.gemid) g2
    LEFT JOIN gempage gp ON g2.gemid = gp.gemid
    JOIN (SELECT @rownum := 0) Z
    WHERE g2.tab=0 ORDER BY sort asc
philipxy
  • 14,867
  • 6
  • 39
  • 83
mseifert
  • 5,390
  • 9
  • 38
  • 100
  • You could try to do the numbering in an additional simple upper level select without ordering or grouping. – olivecoder Jul 17 '16 at 00:32
  • The manual explicitly says reading & assigning the same variable in the same select statement is undefined behaviour in MySQL, see the documentation re assignment & variables. [Why the order of evaluation for expressions involving user variables is undefined?](https://stackoverflow.com/a/44751302/3404097) See my comments (re an incorrect answer) at [MySQL - Define a variable within select and use it within the same select](https://stackoverflow.com/a/16715618/3404097). You must use a stored procedure or 8.0+ window functions. – philipxy Dec 09 '22 at 20:36
  • After a LEFT JOIN a WHERE, INNER JOIN or HAVING that requires a right table column to be not NULL "turns LEFT JOIN into INNER JOIN"--INNER JOIN could/should have been used instead. – philipxy Dec 09 '22 at 20:41
  • Does this answer your question? [Define a variable within select and use it within the same select](https://stackoverflow.com/questions/16715504/define-a-variable-within-select-and-use-it-within-the-same-select) – philipxy Dec 10 '22 at 02:20
  • Debug questions require a [mre]--cut & paste & runnable code including initialization; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL include DDL & tabular initialization code. For debug that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. [ask] [Help] When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. (Debugging fundamental.) – philipxy Dec 10 '22 at 02:20
  • @philipxy The answer that I posted is because using `@rownum := 0` is unreliable. The rows will not necessarily be numbered in the order specified by the sort. Perhaps you should read through the article I linked to in my answer below. My answer documents that `ROW_NUMBER() OVER()` worked and is reliable. It is also different than what you flagged, so I request that you remove / undo your "similar question" flag as is is NOT. – mseifert Dec 10 '22 at 02:28
  • Again: The problem is not the assignment to rownum, it is both assigning & reading it. But anyway assignment time in a select that doesn't read is also undefined. "Perhaps you should read through" my links & the manual. The blog is wrong that "it’s also possible to set and read a user variable in a single statement" using defined behaviour, and if they are relying on a particular implementation they don't justify it or the exact cases it applies to. A SO/SE question is a duplicate of another when the answer is the same, and the problem here is what I said. – philipxy Dec 10 '22 at 02:41
  • While I am not at your level of understanding, please give me some credit for having tried EVERY combination of assign and read possible in my use case. The example given is over simplified. In the end the answer that actually solved my problem was different and so that is what I am trying to point to. Others might actually find my different solution helpful. – mseifert Dec 10 '22 at 02:46
  • Please don't insert "EDIT"s/"UPDATE"s, just make your post the best presentation as of edit time. Please do not put answers into question posts, they belong in answer posts. This is why I edited out answers before. [Help] [meta] [meta.se] – philipxy Dec 10 '22 at 02:51
  • When I clicked on the button which said that the link you posted did not answer my question, it gave me a response which said to edit my question and state why my question was different. I was just following the instructions SO gave me. – mseifert Dec 10 '22 at 02:56
  • Your edit was not phrased as addressing the duplicate suggestion, it was not clear that you were doing that. But please when explaining why a dupe doesn't answer just make it part of straightforward presentation, don't have meta commentary. But the suggested duplicate does answer because the answer there says, you can't set & read the same user variable (and so your hypotheses about grouping are irrelevant). However, the accepted & upvoted answer at the duplicate is wrong. I couldn't suggest the other link in my comment as a duplicate because there's no upvoted or accepted answer (mine). – philipxy Dec 10 '22 at 03:01
  • **That blog supports & states everything I say except for a few extremely badly-phrased misleading sentences in it.** Like the one I quoted or "It works"--immediately followed by "It relies on [unjustified implementation details & no characterization of just when it works]. But in the future, [not]." Suggest you reread the blog in entirety closely. In particular the paragraph after the one starting with that quoted sentence: – philipxy Dec 10 '22 at 07:03
  • "The point I’m making is: these queries make assumptions which may hold today, but are not future-proof. *If* they hold today we have no conscious intention to break them, but one day we unavoidably will, as old code gets refactored, as new SQL features are implemented, as we come closer to the SQL standard. What has never been explicitly guaranteed may change. The accidental change may come from our side, or from your side (see the example with the citizenship’s full name and the join)." [And there is no description of when it does happen to hold.] – philipxy Dec 10 '22 at 07:04

1 Answers1

0

It's six and a half years later, but this problem was never fully solved. Fortunately with MySql 8 and MariaDB 10.2, Window functions were added which address the problem of sorts being unreliable. The solution is to use SELECT ROW_NUMBER() OVER as shown below.

This article gives a nice summary of why using @rownum:=(@rownum+1) isn't reliable. I was surprised not to find an answer on SO which gave me this, so I wanted to post for anyone else up at 2AM pulling your hair out thinking, "But it should work!". Now it will.

You can now group and sort and finally number your rows without MySql internally "optimizing" the final sort away.

    SELECT ROW_NUMBER() OVER(ORDER BY sort ASC) AS row_num, 
        g2.gemid,
        sum_rating
    FROM
        (
        SELECT
            g.gemid AS gemid,
            g.tab,
            rt.sum_rating AS sum_rating
        FROM
            gems g
        LEFT JOIN(
            SELECT gemid,
                SUM(rating) AS sum_rating
            FROM
                gemrating
            GROUP BY
                gemid
        ) rt
        ON
            g.gemid = rt.gemid
        WHERE
            g.tab = 0
        GROUP BY
            g.gemid
        ) g2
    LEFT JOIN gempage gp ON
        g2.gemid = gp.gemid
    WHERE
        g2.tab = 0
    ORDER BY
        sort ASC
mseifert
  • 5,390
  • 9
  • 38
  • 100