How to select non aggregated data when using aggregation functions

Question

I am trying to include non-aggregated data, e.g. user_id after applying an aggregated function that does not include that same data.

I have found a solutions and looking for a more elegant solution.

I am looking for a solution without using a window function (like row_number).

Here is my data: It represent an event of a competition. Every unique combination of event_date and tier is a unique competition. My goal is to get the winner of each competition.

CREATE TABLE interview_2 
(
    user_id INT,
    tier VARCHAR(25),
    event_date DATE,
    user_score INT
);

INSERT INTO interview_2 VALUES ( 1, 'A', GETDATE(), 100 );
INSERT INTO interview_2 VALUES ( 2, 'B', GETDATE(), 80);
INSERT INTO interview_2 VALUES ( 1, 'A', DATEADD( DAY, 1, GETDATE() ), 90);
INSERT INTO interview_2 VALUES ( 3, 'A', DATEADD( DAY, 1, GETDATE() ), 110);
INSERT INTO interview_2 VALUES ( 4, 'A', GETDATE(), 60);
INSERT INTO interview_2 VALUES ( 5, 'B', GETDATE(), 50);
GO

My solution was a simple join:

SELECT scores.tier,
       scores.event_date,
       users.user_id,
       scores.max_score
FROM 
    interview_2 users 
JOIN 
    (SELECT 
         tier,
         event_date,
         MAX(user_score) AS 'max_score'
     FROM 
         interview_2 scores
     GROUP BY 
         tier, event_date) scores ON users.tier = scores.tier
                                  AND users.event_date = scores.event_date
                                  AND users.user_score = scores.max_score
ORDER BY 
    1, 2, 4;

Output:

tier    event_date  user_id user_score
--------------------------------------
A       2022-03-18  1       100
A       2022-03-19  3       110
B       2022-03-18  2       80

Is there a simpler solution?

Does this answer your question? [Get top 1 row of each group](https://stackoverflow.com/questions/6841605/get-top-1-row-of-each-group) — Thom A, Mar 18 '22 at 14:21
^^ Wow.. [this answer](https://stackoverflow.com/a/48412942/8895292) from the links above is very elegant. — SOS, Mar 19 '22 at 06:17

Nick Fotopoulos · Answer 1 · 2022-03-21T12:46:40.743

Update

Your existing query is good. Whether or not it's the simplest solution is kind of subjective. One alternative would be to put the subquery in the select list. It looks different, and might even perform different depending on your environment, but ultimately it the same logic, just organized differently.

SELECT  DISTINCT
        tier        = i.tier,
        event_date  = i.event_date,
        user_id     = (
            SELECT
                w.user_id
            FROM    @interview_2 AS w
            WHERE
                w.user_score = MAX(i.user_score)
                AND w.tier = i.tier
                AND w.event_date = i.event_date
        ),
        top_score   = MAX(i.user_score)
FROM    @interview_2 AS i
GROUP BY
    i.tier,
    i.event_date
ORDER BY
    i.tier,
    i.event_date,
    MAX(i.user_score);

ORIGINAL POST

Window Functions:

SELECT DISTINCT
    users.tier,
    users.event_date,
    FIRST_VALUE(users.user_id) OVER (PARTITION BY
                                        users.tier,
                                        users.event_date
                                    ORDER BY users.user_score DESC
                            ),
    MAX(users.user_score) OVER (PARTITION BY users.tier, users.event_date)
FROM    interview_2 AS users;

OVER PARTITION BY basically allows you to use aggregate functions on a column by column basis plus several other neat windowing functions. You can use different grouping and ordering for each. They are amazing once you understand them.

In this case:

FIRST_VALUE(user_id) will return the very first user_id it finds PARTITIONED (grouped) BY tier and event_date and ORDER BY user_score DESC ensures that the user_id with the highest score is the first record.
MAX(user_score) will return the higest score it finds PARTITIONED (grouped) BY tier, event_date. No ordering is necessary here since it's an aggregate.
DISTINCT is included because for each record with a corresponding tier/event_date the FIRST_VALUE and MAX will be returned.

Here is a good starter article on Window Functions

Complete example

WITH
    interview_2 AS (
        SELECT
            *
        FROM    (
            VALUES (1, 'A', GETDATE(), 100),
                (2, 'B', GETDATE(), 80),
                (1, 'A', DATEADD(DAY, 1, GETDATE()), 90),
                (3, 'A', DATEADD(DAY, 1, GETDATE()), 110),
                (4, 'A', GETDATE(), 60),
                (5, 'B', GETDATE(), 50)
        ) AS i2 (user_id, tier, event_date, user_score)
    )
SELECT DISTINCT
    users.tier,
    users.event_date,
    FIRST_VALUE(users.user_id) OVER (PARTITION BY
                                        users.tier,
                                        users.event_date
                                    ORDER BY users.user_score DESC
                            ),
    MAX(users.user_score) OVER (PARTITION BY users.tier, users.event_date)
FROM    interview_2 AS users;

Thanks you I al looking for another approach without window function. — Nir, Mar 18 '22 at 17:49
I completely missed the "non-window function" part of your post. Aside from window functions, your existing solution is about as simple as it gets. — Nick Fotopoulos, Mar 21 '22 at 12:47

How to select non aggregated data when using aggregation functions

1 Answers1

Update

ORIGINAL POST

Window Functions:

Complete example