1

I tried to write my own window function in mysql but found something I don't understand as shown below.

The table employee has 13 rows,

+----+------+--------+
| id | name | salary |
+----+------+--------+
|  1 | a    |    123 |
|  2 | a    |    234 |
|  4 | a    |    254 |
|  8 | a    |    724 |
|  9 | a    |    432 |
| 12 | a    |    123 |
|  3 | b    |    124 |
| 10 | b    |    333 |
| 11 | b    |     11 |
| 13 | b    |    628 |
|  5 | c    |    111 |
|  6 | c    |    777 |
|  7 | c    |    666 |
+----+------+--------+

So if I use the following query,

select name, salary, 
      (case name when @prev_name  then  @rank := @rank + 1
       else @prev_name := name and @rank := 1 end) + 1 as Rank
from employee , (select @rank := 0,  @prev_name := null ) r  
order by name ;

The results are

+------+--------+------+
| name | salary | Rank |
+------+--------+------+
| a    |    123 |    1 |
| a    |    234 |    2 |
| a    |    254 |    3 |
| a    |    724 |    4 |
| a    |    432 |    5 |
| a    |    123 |    6 |
| b    |    124 |    7 |
| b    |    333 |    8 |
| b    |     11 |    9 |
| b    |    628 |   10 |
| c    |    111 |   11 |
| c    |    777 |   12 |
| c    |    666 |   13 |
+------+--------+------+

with 14 warnings,

+---------+------+--------------------------------------------------------------------+
| Level   | Code | Message                                                            |
+---------+------+--------------------------------------------------------------------+
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'b                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'b                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'b                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'b                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'c                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'c                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'c                             ' |
+---------+------+--------------------------------------------------------------------+

But if I change the order inside the else clause,

select name, salary, 
      (case name when @prev_name  then  @rank := @rank + 1
       else @rank := 1 and @prev_name := name end) + 1 as Rank
from employee , (select @rank := 0,  @prev_name := null ) r  
order by name ;

The output is different,

+------+--------+------+
| name | salary | Rank |
+------+--------+------+
| a    |    123 |    1 |
| a    |    234 |    2 |
| a    |    254 |    3 |
| a    |    724 |    4 |
| a    |    432 |    5 |
| a    |    123 |    6 |
| b    |    124 |    1 |
| b    |    333 |    2 |
| b    |     11 |    3 |
| b    |    628 |    4 |
| c    |    111 |    1 |
| c    |    777 |    2 |
| c    |    666 |    3 |
+------+--------+------+

without warnings.

My guess is that is related to logical operator short-cut and the return value of the assignment operator. But I could not found any useful information online.

This becomes even more interesting after I tried different ways.

So the following query gives totally different results sometimes,

set @prev_name := null; set @rank := 0; 
select name, salary, 
       (case name when @prev_name  then  @rank := @rank + 1 else @prev_name := name and @rank := 1 end) + 1 as Rank
from employee order by name ;

+------+--------+------+
| name | salary | Rank |
+------+--------+------+
| a    |    123 |    1 |
| a    |    234 |    1 |
| a    |    254 |    1 |
| a    |    724 |    1 |
| a    |    432 |    1 |
| a    |    123 |    1 |
| b    |    124 |    1 |
| b    |    333 |    1 |
| b    |     11 |    1 |
| b    |    628 |    1 |
| c    |    111 |    1 |
| c    |    777 |    1 |
| c    |    666 |    1 |
+------+--------+------+

With 13 warnings,

+---------+------+--------------------------------------------------------------------+
| Level   | Code | Message                                                            |
+---------+------+--------------------------------------------------------------------+
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'a                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'b                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'b                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'b                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'b                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'c                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'c                             ' |
| Warning | 1292 | Truncated incorrect DOUBLE value: 'c                             ' |
+---------+------+--------------------------------------------------------------------+
CS Pei
  • 10,869
  • 1
  • 27
  • 46
  • I could not reproduce your first result with your first query. – trincot May 19 '17 at 17:09
  • The return value of an assignment is normally the value assigned. Your code doesn't really make much sense. – user207421 May 19 '17 at 17:44
  • @trincot, that is one problem with that query, it is not deterministic. – CS Pei May 19 '17 at 17:51
  • @EJP, It does not make sense to me either. That is why I am asking here – CS Pei May 19 '17 at 17:57
  • What doesn't make sense is to AND the result of two assignments together using a short-cut operator and then express surprise because you get different results when you reverse the order, *unless* you are certain that both assignments yield `true` or both `false`. In the other cases your surprise has no rational basis. – user207421 May 20 '17 at 05:00

2 Answers2

0

I believe I have an explanation for what you are seeing. Consider your first query, which is not generating row numbers per group as you intended:

select name, salary, 
      (case name when @prev_name  then  @rank := @rank + 1
       else @prev_name := name and @rank := 1 end) + 1 as Rank
from employee , (select @rank := 0,  @prev_name := null ) r  
order by name ;

I hypothesize that the assignments @prev_name := name and @rank := 1 both return false. If this were true, then the other part of the else condition would not need to evaluate, because of short-circuiting, since the outcome would already be known to be false. In the case of your first query, this means that the assignment @prev_name := name would happen at each row. If we also assume that the @prev_name := null assignment happens before this first assignment, then it would explain why the first result set is just a row number over the entire table. This is the case because @rank := @rank + 1 would occur for every row, over the entire table. On the other hand, in your second query:

select name, salary, 
      (case name when @prev_name  then  @rank := @rank + 1
       else @rank := 1 and @prev_name := name end) + 1 as Rank
from employee , (select @rank := 0,  @prev_name := null ) r  
order by name ;

In this case, again @rank := 1 returns false, but now the assignment to @prev_name does not happen.

But, I would not write the query you did, because it seems confusing to me. Instead, I suggest that you use the following query:

SET @row_number = NULL;
SET @emp_name = NULL;

SELECT
    @row_number:=CASE WHEN @emp_name = name THEN @row_number + 1 ELSE 1 END AS Rank,
    @emp_name := name,
    salary
FROM employee
ORDER BY name
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • My questions is what is the return value of an assignment and what order of these expressions executed. But I could not find any docs – CS Pei May 19 '17 at 16:09
0

I could not reproduce your first result, but there are several things you should take into account:

  • The order in which the select clause expressions are fed with records does not depend on the order by clause, since the sorting only happens after all result records have been evaluated. So even if all other logic is correct, the result can vary.

  • The assignment (:=) operator has the lowest priority. This means and is executed before :=.

  • As an assignment only makes sense when at the left it has a variable, this variable can never take part in another operation before the assignment happens. So this is a kind of exception to the normal precendence rules.

  • For logical operators a short-cut evaluation is performed (not found in documentation).

  • When string values take part in a logical operator, they are converted to boolean (0 or 1)

Taking these rules into account, see how the following evaluates:

@prev_name := name and @rank := 1
  1. and has the highest precedence, so the argument at the left of it is evaluated first, this is the value of name
  2. The value for name is converted to a boolean as it has to take part in a boolean operation. As in your sample data none of the names represent a valid number, this value is 0
  3. As this is enough to know the result of the and operation, the right argument is not evaluated, and so the assignment @rank := 1 is not performed.
  4. Finally the left-most assignment is performed, and @prev_name gets value 0.

In your query this means the comparison case name when @prev_name ... could hardly ever be true: the first time it is not true because @prev_name is null, and after that it will always be 0. An exception to this would be if name has the value "0" or "1": in that case you will get a rank increase.

Now to the inverse expression:

@rank := 1 and @prev_name := name

The evaluation happens as follows:

  1. and has the highest precedence, so the argument at the left of it is evaluated first, this is 1
  2. As this is now not enough to know the result of the and operation, the right argument must be evaluated
  3. As the right side is an assignment, that assignment must happen first, so at this moment @prev_name gets the value of name
  4. For this value to partake in the boolean operation it is converted to a number, which most probably is 0 (see above): it will be 0 for the sample data you have given.
  5. This makes the result of the and operation to be 0
  6. Finally, this value (0) is assigned to @rank

In this scenario @prev_name gets a useful value, and if the recordset is traversed in the order we hope for, then indeed the rank will increase.

Another issue can surface because of this initialisation:

@prev_name := null

This does not tell MySql which datatype this variable has, and so it may decide it is of a numeric data type. This can lead to weird effects, where @prev_name stays null throughout the query, or takes on the character code of the first character in the name value.

Better approach

To better secure the order in which the records are fetched, you would need to fetch them in a sub-select, which you provide with the order by clause.

Only in the outer query you can then rely on a specific order of evaluation -- even though there is still no documented guarantee for that.

Initialise your variables with something that leaves no doubt about the data type, for instance:

@prev_name := ''

You can use the MySql if function for making assignments and at the same time return a value, like this:

if(a := b, value, value)

Whatever the assignment returns as boolean result, if will return value.

Taking all this together, I would suggest this query:

select name, salary, 
       @rank := if(name = @prev_name, @rank + 1, 
                   if(@prev_name := name, 1, 1)) as Rank
from   (select name, salary from employee order by name) employee,
       (select @rank := 0, @prev_name := '') init
order  by name;

Final remark

Other databases have window/analytic/OLAP functions, which makes ranking a much more reliable task.

Community
  • 1
  • 1
trincot
  • 317,000
  • 35
  • 244
  • 286