1

So I am trying to build a query that will show me which users have the most points, for each type of activity. You can see the table strucutre below. Each activity has an activity_typeid and each of those carries a certain activity_weight.

In the example below, Bob has scored 50 points for calls and 100 points for meetings. James has scored 100 points for calls and 100 points for meetings.

userid     activity_typeid     activity_weight
------------------------------------------------------------
123 (Bob)  8765 (calls)                50
123 (Bob)  8121 (meetings)             100
431 (James) 8765 (calls)               50
431 (James) 8121 (meetings)            100
431 (James) 8765 (calls)               50

I want to be able to output the following:

  1. Top Performer for Calls = James
  2. Top Performer for Meetings = Bob, James.

I don't know the activity_typeid's in advance, as they are entered randomly, so I was wondering if it is possible to build some sort of query that calculates the SUM for each DISTINCT/UNIQUE activity_typeid ?

Thanks so much in advance.

user2656127
  • 655
  • 3
  • 15
  • 31
  • Why is (James) (calls) duplicated in your table? – user4035 Oct 03 '13 at 07:54
  • Sorry, I should've explained that better. A new entry is entered into this table each time this action (calls) is performed. So, another 50 points is added to his total points. So it's not a duplicate per say. – user2656127 Oct 03 '13 at 07:55
  • See it now, you said, that James has 100 points for calls = 50 + 50 – user4035 Oct 03 '13 at 07:56
  • Could you provide SQL code for this table and sample data on SQLFiddle, so people could check their answers. Plus the desired output. – user4035 Oct 03 '13 at 07:58

5 Answers5

2

What you need is equivalent of analytic function DENSE_RANK(). One way to do it in if you need top performers for each activity

SELECT a.activity_typeid, GROUP_CONCAT(a.userid) userid
  FROM
(
  SELECT activity_typeid, userid, SUM(activity_weight) activity_weight
    FROM table1
  -- WHERE ...
   GROUP BY userid, activity_typeid
) a JOIN 
(
  SELECT activity_typeid, MAX(activity_weight) activity_weight
    FROM
  (
    SELECT activity_typeid, userid, SUM(activity_weight) activity_weight
      FROM table1
      -- WHERE ...
     GROUP BY userid, activity_typeid
  ) q
   GROUP BY activity_typeid
) b 
    ON a.activity_typeid = b.activity_typeid
   AND a.activity_weight = b.activity_weight
 GROUP BY activity_typeid

Another way to emulate DENSE_RANK() in MySQL is to leverage session variables

SELECT activity_typeid, GROUP_CONCAT(userid) userid
  FROM
(
  SELECT activity_typeid, userid, activity_weight,
         @n := IF(@g = activity_typeid, IF(@v = activity_weight, @n, @n + 1) , 1) rank,
         @g := activity_typeid, @v := activity_weight
    FROM
  (
    SELECT activity_typeid, userid, 
           SUM(activity_weight) activity_weight
      FROM table1
    -- WHERE ...
     GROUP BY activity_typeid, userid
  ) q CROSS JOIN (SELECT @n := 0, @g := NULL, @v := NULL) i
     ORDER BY activity_typeid, activity_weight DESC, userid
) q
 WHERE rank = 1
 GROUP BY activity_typeid

Output:

| ACTIVITY_TYPEID |  USERID |
|-----------------|---------|
|            8121 | 123,431 |
|            8765 |     431 |

Here is SQLFiddle demo for both queries

peterm
  • 91,357
  • 15
  • 148
  • 157
  • If I wanted to add a WHERE clause, should I add it in a, b and c or just once at the end? – user2656127 Oct 03 '13 at 08:16
  • @user2656127 You're very welcome :) See updated answer. You want both `WHERE` clauses to be the same. – peterm Oct 03 '13 at 08:23
  • Thanks so much for a great answer. If I could bug you with one more thing. I'm looking to run 1 additional join on here, mainly to get the user name from the user table using the userid. Should I place that at the end? In addition, will this code work with an unlimited amount of activity_typeid's in a table (i.e. a competition that drives 10 different types of activities), or just a set number? THANKS AGAIN :D – user2656127 Oct 03 '13 at 08:28
  • @user2656127 Do you need just usernames or you also need userid? – peterm Oct 03 '13 at 08:39
  • Well, I will use the userid to join to the users table to select name and profilepicture. – user2656127 Oct 03 '13 at 08:39
  • @user2656127 Then you probably don't want to aggregate userids of performers that rank with ties into a delimited string – peterm Oct 03 '13 at 08:45
  • Could you explain that a little more? – user2656127 Oct 03 '13 at 08:49
  • I placed this at the end of the query, worked, but are there any risks in doing so? INNER JOIN users1 u ON a.userid = u.id – user2656127 Oct 03 '13 at 08:50
  • @user2656127 There are two approaches. **1)** don't aggregate in outer select by `activity_typeid` and put username and picture columns directly into the resultset **2)** to pack all column values per user in delimited string and apply `GROUP_CONCAT()` on top of that. Obviously you need to use different delimiters in inner `CONCAT` and in `GROUP_CONCAT()`. If you use php on client side you may find second approach very useful because you can easily `explode()` values while you iterate over the resultset. See this http://sqlfiddle.com/#!2/494a32/3 for illustration. – peterm Oct 03 '13 at 09:03
  • So, I wrote the query like this, and I am getting the desired output, but it's slightly different to the query you just linked to there. http://pastebin.com/kaFUR21J Is there any major problem with mine? – user2656127 Oct 03 '13 at 09:09
  • @user2656127 To illustrate the major problem see this http://sqlfiddle.com/#!2/494a32/6 example. As you may see using your query you loose Bob's name and picture while you have his userid. Your version of query works correctly only if there are no ties between users. But in this scenario, when there are no ties, you don't need `GROUP_CONCAT()` and `GROUP BY`. BTW your version works only because MySQL `GROUP BY` extension is very lax and forgives you this. In any other major RDBMs (SQL Server, Oracle, Postgres) you would get an error if you try to execute it. – peterm Oct 03 '13 at 10:09
  • @user2656127 The rule of thumb is you either apply an aggregate function to all columns that are not part of `GROUP BY` or you don't use `GROUP BY` at all. – peterm Oct 03 '13 at 10:12
0

You must use the GROUP BY statement to calculate the sum for each user and each activity typeid. Try something like this:

SELECT userid, activity_typeid, SUM(activity_weight)
FROM table
GROUP BY userid, activity_typeid

Then use this as a subquery to determine the top performer for each activity_typeid.

Dan
  • 10,480
  • 23
  • 49
0

If you want to use an aggregate function, you will need to use GROUP BY, instead of DISTINCT. You would group by userid and activity_typeid, and select SUM(activity_weight).

JoeFryer
  • 2,751
  • 1
  • 18
  • 23
0

Sure, just use group by like this :

SELECT t.userid, t.activity_typeid, sum(t.activity_weight) 
FROM YOURTABLE t
GROUP BY t.userid, t.activity_typeid
Kristof
  • 3,267
  • 1
  • 20
  • 30
0

Query to select the (single) top performer for a given activity:

SELECT userid, activity_typeid, SUM(activity_weight)
FROM ?
WHERE activity_typeid = ?
GROUP BY userid
ORDER BY SUM(activity_weight) DESC
LIMIT 1

You can change the LIMIT to output the top 5 or 10 performers, and then inspect them in code to see which you want to show (e.g. if two people have the same score).

Vegard Larsen
  • 12,827
  • 14
  • 59
  • 102
  • Thanks! The problem is the WHERE activity_typeid = ? That is what I'm not sure how to solve, because I don't know that, and want to loop over all the id's in that table (WHERE an id matches) – user2656127 Oct 03 '13 at 07:58
  • You know you can build the query at run time when you do know the `activity_typeid`? Look up "parameterized queries" to see how it can be done very easily. – Vegard Larsen Oct 03 '13 at 08:00