2

I am using a modified version of a query similiar to another question here:Convert SQL Server query to MySQL

Select *
from
(
SELECT tbl.*, @counter := @counter +1 counter
FROM (select @counter:=0) initvar, tbl
Where client_id = 55
ORDER BY ordcolumn
) X
where counter >= (80/100 * @counter);
ORDER BY ordcolumn

tbl.* contains the field 'client_id' and I am attempting to get the top 20% of the records for each client_id in a single statement. Right now if I feed it a single client_id in the where statement it gives me the correct results, however if I feed it multiple client_id's it simply takes the top 20% of the combined recordset instead of doing each client_id individually.

I'm aware of how to do this in most databases, but the logic in MySQL is eluding me. I get the feeling it involves some ranking and partitioning.

Sample data is pretty straight forward.

Client_id  rate    
1          1
1          2
1          3
(etc to rate = 100)
2          1
2          2
2          3
(etc to rate = 100)

Actual values aren't that clean, but it works.

As an added bonus...there is also a date field associated to these records and 1 to 100 exists for this client for multiple dates. I need to grab the top 20% of records for each client_id, year(date),month(date)

Community
  • 1
  • 1
Twelfth
  • 7,070
  • 3
  • 26
  • 34
  • Can you post some sample data? – EoinS Jul 20 '16 at 21:31
  • Well without data we cant help, but looks like you want [ROW_NUMBER() OVER (PARTITION BY)](http://stackoverflow.com/a/8177307/3470178) – Juan Carlos Oropeza Jul 20 '16 at 21:32
  • @EoinS - sample data added...it's what I'm working with for the time being until I can prove this works. Current logic uses 10 temp tables and takes hours, I'm hoping this solution is a bit faster – Twelfth Jul 20 '16 at 22:00

2 Answers2

2

You need to do the enumeration for each client:

SELECT *
FROM (SELECT tbl.*, @counter := @counter +1 counter
             (@rn := if(@c = client_id, @rn + 1,
                        if(@c := client_id, 1, 1)
                       )
             )
      FROM (select @c := -1, @rn := 0) initvar CROSS JOIN tbl
      ORDER BY client_id, ordcolumn
     ) t cross join
     (SELECT client_id, COUNT(*) as cnt
      FROM tbl
      GROUP BY client_id
     ) tt
where rn >= (80/100 * tt.cnt);
ORDER BY ordcolumn;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Did you mean for your outer `WHERE` clause to reference `@rn`? I get the feeling it might be useless outside the subquery; it is going to "reset" for each client, meaning all clients' rows will be compared to a number only relevant to the one with the highest client_id. – Uueerdo Jul 20 '16 at 21:38
  • Seems close to what I need...working with the idea here. Thanks Gordon, I'm pretty sure you answer most of my questions on here. – Twelfth Jul 20 '16 at 22:07
  • @Uueerdo . . . Totally correct. That gets fixed by using an aggregation query. – Gordon Linoff Jul 21 '16 at 01:40
1

Using Gordon's answer as a starting point, I think this might be closer to what you need.

SELECT t.*
    , (@counter := @counter+1) AS overallRow
    , (@clientRow := if(@prevClient = t.client_id, @clientRow + 1,
                if(@prevClient := t.client_id, 1, 1) -- This just updates @prevClient without creating an extra field, though it makes it a little harder to read
               )
     ) AS clientRow
    -- Alteratively (for everything done in clientRow)
    , @clientRow := if(@prevClient = t.client_id, @clientRow + 1, 1) AS clientRow
    , @prevClient := t.client_id AS extraField
    -- This may be more reliable as well; I not sure if the order 
    -- of evaluation of IF(,,) is reliable enough to guarantee
    -- no side effects in the non-"alternatively" clientRow calculation.
FROM tbl AS t
INNER JOIN (
    SELECT client_id, COUNT(*) AS c
    FROM tbl 
    GROUP BY client_id
   ) AS cc ON tbl.client_id = cc.client_id
INNER JOIN (select @prevClient := -1, @clientRow := 0) AS initvar ON 1 = 1
WHERE t.client_id = 55
HAVING clientRow * 5 < cc.c  -- You can use a HAVING without a GROUP BY in MySQL 
     -- (note that clientRow is derived, so you cannot use it in the `WHERE`)
ORDER BY t.client_id, t.ordcolumn
;
Uueerdo
  • 15,723
  • 1
  • 16
  • 21
  • Thanks uueerdo....ill be working with this tonight and will give an update tomorrow morning. The logic makes sense at any rate – Twelfth Jul 20 '16 at 22:54