Pulling additional field data based on the return of an aggregated field.

Question

This question may have already been asked, but I have been seeing it pop up numerous places here.

Let's take a basic table:

[Table A]

id value attribute 
1   4      yellow
2   6      green 
3   5      blue

I want to create a query that will find the MAX() of the field value in [Table A] and then return its associated attribute. In this case, that would be green.

Here is how I would write this given what I currently know about SQL.

SELECT id, value , attribute FROM [Table A]
WHERE value IN (
SELECT MAX(value) FROM [Table A] ) 
ORDER BY id

This is not optimal for larger tables since I have full scan [Table A] at least once, and because most designs will not have an index on the value column.

This question gets even more complicated when we attempt to add windowed functions into the mix.

Let's make another table.

[Table B]

id    value    attribute  id_group 
1    4         yellow        1
2    4         yellow        1
3    5         blue          1 
4    9         green         2 
5    8         yellow        2  
6    7         yellow        2 
7    9         blue          3 
8    3         yellow        3
9    4         blue          3

Instead of simply taking the MAX() value of the entire table, this time I want the MAX() over a grouping of id values. I've gone ahead and already pre-assigned the id fields in [Table B] to the correct grouping in id_group

Now I could do something similar to my query above for the windowed function. As I understand it, it would look something like this.

SELECT id , value , attribute 
FROM [Table B] WHERE value IN ( 
SELECT MAX(value) OVER (PARTITION BY id_group))

However, this will not return the desired result. This is because we have lost information within the WHERE clause sub-select. We would need to return the id_group field associated to the MAX(value) returned from that sub-select as well.

Obviously, returning two fields from the same sub-select would break our WHERE clause syntax.

So my question is this. What is the best method for aggregating a field on a table and returning the information for fields on the same record?

Of course there will be instances where the aggregate record, in this case MAX(), could actually be a record set and not a single record line.

Does anyone have extensive experience writing queries that will do this without the need for extraneous variables?

Terribly sorry if this has already been asked and answered.

what should the output look like? – Vamsi Prabhala Aug 17 '18 at 19:21 — Vamsi Prabhala, Aug 17 '18 at 19:21

score 0 · Answer 1 · answered Aug 17 '18 at 19:31

For you second query I guess you want to see the result typically either

SELECT id,
       value,
       attribute,
       group_id
       FROM (SELECT id,
                    value,
                    attribute,
                    group_id,
                    rank() OVER (PARTITION BY id_group
                                 ORDER BY value DESC) r
                    FROM [Table B]) x
       WHERE r = 1;

or

SELECT id,
       value,
       attribute,
       group_id
       FROM (SELECT id,
                    value,
                    attribute,
                    group_id,
                    row_number() OVER (PARTITION BY id_group
                                       ORDER BY value DESC) r
                    FROM [Table B])
       WHERE r = 1;

would produce.

The difference between the two is, that using rank() will output ties. If there are two rows having the same, highest value, they'd both get outputted. Using row_number() only one of them would make it to the output. Without further specification this can be any of the two.

Thank you, I had forgotten about rank(). – Edward Aug 17 '18 at 19:50 — Edward, Aug 17 '18 at 19:50

Pulling additional field data based on the return of an aggregated field.

1 Answers1