1

I have a table with list of employees and the number of units that they have sold.

I want to get the top 25 percentile Avg units sold and Bottom 25 percentile Avg units sold.

I have created a representation of my data SLQ Fiddle

I really have no idea how to start on this? All the examples i see are for SQL Server and not MySQL. Here is what i am thinking.

I want 25 percentile and cant limit to 25 items. Basically it would involve:

1) #_of_employees = The number of total employees. 

2) #_of_employees_in_25_percentile =  #_of_employees*0.25 

3) Calculate the sum of the units sold by the top/bottom 25 percentile (limit #_of_employees_in_25_percentile)

4) Divide the sum by #_of_employees_in_25_percentile to get the average.

How can all this be done efficiently in MySQL?

pdw
  • 8,359
  • 2
  • 29
  • 41
codeNinja
  • 1,442
  • 3
  • 25
  • 61

3 Answers3

2

This is a solution that uses a devious trick I learned from this question.

SELECT id, unit_sold, n * 100 / @total AS percentile
FROM (
  SELECT id, unit_sold, @total := @total + unit_sold AS n
  FROM mydata, (SELECT @total := 0) AS total
  ORDER BY unit_sold ASC
) AS t

SQL Fiddle.

Community
  • 1
  • 1
pdw
  • 8,359
  • 2
  • 29
  • 41
  • Nice solution! Just one minor point, I think it should be `@total := @total + 1` instead of `@total := @total + unit_sold`, it appears that the percentile is computed on the employee number, rather than the total unit sold. – chiwangc Aug 15 '14 at 08:32
0

What about this?

SELECT
  SUM(unit_sold) AS sum_tot, SUM(unit_sold)/count(id) AS average,
  SUM(CASE WHEN percentile<25 THEN unit_sold ELSE 0 END) AS sum_top25,
  SUM(CASE WHEN percentile<25 THEN 1 ELSE 0 END) AS count_top25,
  SUM(CASE WHEN percentile<25 THEN unit_sold ELSE 0 END)/SUM(CASE WHEN percentile<25 THEN 1 ELSE 0 END) AS average_top25,
  SUM(CASE WHEN percentile>75 THEN unit_sold ELSE 0 END) AS sum_bottom25,
  SUM(CASE WHEN percentile>75 THEN 1 ELSE 0 END) AS count_bottom25,
  SUM(CASE WHEN percentile>75 THEN unit_sold ELSE 0 END)/SUM(CASE WHEN percentile>75 THEN 1 ELSE 0 END) AS average_bottom25
FROM
(SELECT
  id, unit_sold, c * 100 / @counter AS percentile
FROM
   (SELECT
      m.*, @counter:=@counter+1 AS c 
   FROM
     (SELECT @counter:=0) AS initvar, mydata AS m
   ORDER BY unit_sold desc
   ) AS t
WHERE
  c <= (25/100 * @counter)
  OR c >= (75/100 * @counter)
) AS t2

Output:

SUM_TOT AVERAGE     SUM_TOP25   COUNT_TOP25 AVERAGE_TOP25 SUM_BOTTOM25  COUNT_BOTTOM25  AVERAGE_BOTTOM25
850     283.3333    500         1            500          350           2               175

See SQL Fiddle.

The idea is to use the MySQL: LIMIT by a percentage of the amount of records? solution to get the percentiles. Based on that (and on pdw answer) we create an output in which we just show the top 25% and bottom 75%.

Finally, we count and sum to get the values you requested.


Note this runs on top of the command:

SELECT
  id, unit_sold, c * 100 / @counter AS percentile
FROM
   (SELECT
      m.*, @counter:=@counter+1 AS c 
   FROM
     (SELECT @counter:=0) AS initvar, mydata AS m
   ORDER BY unit_sold desc
   ) AS t
WHERE
  c <= (25/100 * @counter)
  OR c >= (75/100 * @counter)

Whose output is:

ID  UNIT_SOLD   PERCENTILE
d   500         20
a   250         80
e   100         100
Community
  • 1
  • 1
fedorqui
  • 275,237
  • 103
  • 548
  • 598
-1

How about going with this logic:

  • Select all, order by percentile (DESC), limit to 25
  • Select all, order by percentile (ASC), limit to 25

Is this the type of logic you're looking for ?

Sample queries:

$q1 = mysql_query(SELECT * FROM table_name ORDER BY percentile DESC LIMIT 25)
$q2 = mysql_query(SELECT * FROM table_name ORDER BY percentile ASC LIMIT 25)
davewoodhall
  • 998
  • 3
  • 18
  • 43
  • Sorry, replace percentile by unit_sold. This should allow you to view the quantity of units sold. From there, calculate the percentile. – davewoodhall Aug 11 '14 at 18:26
  • Limit cannot be 25. I need 25 percentile. Not 25. See updated question. – codeNinja Aug 11 '14 at 18:30
  • Ok, after reading the changes, may I suggest you look at this question and the confirmed solution ? http://stackoverflow.com/questions/3065855/select-nth-percentile-from-mysql – davewoodhall Aug 11 '14 at 22:24
  • @DaveWoodhall that answer is somewhat similar but not what i need. I need the Average Units sold by the Top or Bottom 25% of employees. The answer your pointed to give the `Row` that is closest to `X` percentile. – codeNinja Aug 12 '14 at 13:21