1

I have a table of data that contains costs. For any given record, I would like to know where the record falls in relation to other costs.

Basically, I want to know (how many records have lower costs) / (total number of records).

So far, I've come up with:

SELECT (SUM(IF(r.cost > c.cost, 1, 0)) + 1) / COUNT(1) as percent_rank 
FROM record r, 
   (SELECT cost FROM record WHERE cost IS NOT NULL) as c 
WHERE r.id = 10;

Running EXPLAIN on this query produces:

+----+-------------+------------+-------+---------------+---------+---------+-------+-------+-------------+
| id | select_type | table      | type  | possible_keys | key     | key_len | ref   | rows  | Extra       |
+----+-------------+------------+-------+---------------+---------+---------+-------+-------+-------------+
|  1 | PRIMARY     | r          | const | PRIMARY       | PRIMARY | 4       | const |     1 |             |
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL    | NULL    | NULL  | 21469 |             |
|  2 | DERIVED     | record     | ALL   | NULL          | NULL    | NULL    | NULL  | 21469 | Using where |
+----+-------------+------------+-------+---------------+---------+---------+-------+-------+-------------+

Is there a better way of running this query? 21469 is all the records in the table. I'm a bit worried about the cross join but this query would always be run with id = something so it's just 1 * total num rows.

lokisapocalypse
  • 100
  • 2
  • 9
  • An index on cost could help this should change the select_type from all to index this way MySQL does an complete index scan this should be faster than the full table scan because index file most off the time are smaller unless you have used "index shotgun" – Raymond Nijland Sep 25 '13 at 21:41
  • Find your answer here: http://stackoverflow.com/questions/1057027/calculating-percentile-rank-in-mysql – Chloe Jun 18 '15 at 04:37

1 Answers1

0

I would suggest getting the value of r.cost in a subquery, or if plausible putting it in a variable which is referenced in the query rather than having a join, as this then simplifies the logic you're needing a bit:

DECLARE @thisCost Cost_Datatype;
SELECT @thisCost = cost FROM record WHERE id = 10;

means the query now looks like:

SELECT (SUM(IF(@thisCost > cost, 1, 0)) + 1) / COUNT(1) as percent_rank 
FROM record 
WHERE cost IS NOT NULL 

having removed the complexity of the join, the conditional can also be removed, and the engine allowed to answer the query as a set operation, by putting the total number of rows in its own (sub)query:

SELECT (Count(*) / (SELECT Count(*) FROM record)) as [% with lower cost]
FROM record
WHERE cost IS NOT NULL AND @thisCost > cost

does that do the job?

John Bingham
  • 1,996
  • 1
  • 12
  • 15