Select distinct on an item and return the additional fields associated

Question

I have a table of batches and dates. The batches are not unique so if I run a select distinct on the batch number this will rectify the issue.

I currently have:

SELECT batch_number, date_received, expiry_date, prod_code, quantity
FROM scheme.stquem
GROUP BY batch_number

This gives the error of:

Column 'scheme.stquem.date_received' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

How can I return the additional fields associated with distinct batch numbers?

EDIT: Ideally returning the minimum date/value for each field.

ANSWER:

SELECT batch_number, min(prod_code), min(date_received), min(expiry_date), min(quantity) 
FROM scheme.stquem 
GROUP BY batch_number

Either add the rest of the fields to the `GROUP BY`, or remove the `GROUP BY` and make it a `SELECT DISTINCT`. — Siyual, Aug 11 '15 at 13:49
And how should the database engine determine which values to use for the fields other than `batch_number`? Do you want the record with the most recent `date_received`? The record with the highest `quality`? — Bacon Bits, Aug 11 '15 at 13:52
If you have more than one `date_received` per `batch_number`, which one should be returned? The first, the last, the most median, the mode? This also looks suspiciously like a SQL Server error, but you have tagged with MySQL. Please can you clarify which product you are using? — GarethD, Aug 11 '15 at 13:52
@Siyual, if I add all the others fields to the group by it returns all rows. I want to return every unique batch number and the associated fields. Distinct only allows for one field to be returned. Would you be able to query it where the rowstamp (pk - sort of) is in the table of distinct batch numbers? — matt, Aug 11 '15 at 14:03
That's not how `DISTINCT` works... If you do `SELECT DISTINCT` it will give you the unique rows that are returned across all columns. You really should avoid using `GROUP BY` here, as it's misleading for what you're trying to do. — Siyual, Aug 11 '15 at 14:07
I know it's wrong that's why I'm on here. The overall question still applies which is how to return additional fields where the batch number is unique? — matt, Aug 11 '15 at 14:26
@BaconBits, if there were duplicates then I would like to take the min(date_received), min(expiry_date), min(quantity) — matt, Aug 11 '15 at 14:35

score 2 · Answer 1 · answered Aug 11 '15 at 14:53

If as your comment states:

if there were duplicates then I would like to take the min(date_received), min(expiry_date), min(quantity)

Then just use the MIN function:

SELECT batch_number, MIN(date_received), MIN(expiry_date), MIN(prod_code), MIN(quantity)
FROM scheme.stquem
GROUP BY batch_number;

The problem with this approach, is given the following sample data:

batch_number    date_received   expiry_date prod_code   quantity
-----------------------------------------------------------------
1               2015-08-01      2015-09-01  p1          5
1               2015-08-02      2015-08-08  p1          3
1               2015-08-02      2015-08-09  p0          1

You will get fields from all different rows, so you will get your date_received from the first row, your expiry_date from the 2nd, and prod_code and quantity from the 3rd. This is valid, and aggregates are useful, however I am not sure this would be what you would want.

Instead I imagine you would want to rank your rows within each batch_number, which you can do using ROW_NUMBER():

SELECT  batch_number,
        date_received,
        expiry_date,
        prod_code,
        quantity,
        RowNumber = ROW_NUMBER() 
                        OVER(PARTITION BY batch_number 
                            ORDER BY date_received, expiry_date, prod_code, quantity)
FROM    scheme.stquem

This would give you an output of:

batch_number    date_received   expiry_date prod_code   quantity    RowNumber
------------------------------------------------------------------------------
1               2015-08-01      2015-09-01  p1          5           1
1               2015-08-02      2015-08-08  p1          3           2
1               2015-08-02      2015-08-09  p0          1           3

Then you just need to put the above query into a subquery, and select only the first row for each partition:

SELECT  batch_number, date_received, expiry_date, prod_code, quantity
FROM    (   SELECT  batch_number,
                    date_received,
                    expiry_date,
                    prod_code,
                    quantity,
                    RowNumber = ROW_NUMBER() 
                                    OVER(PARTITION BY batch_number 
                                        ORDER BY date_received, expiry_date, prod_code, quantity)
            FROM    scheme.stquem
        ) AS t
WHERE   t.RowNumber = 1;

So here you get the fields associated with the minimum date_received, rather than for unrelated records.

score 0 · Accepted Answer · answered Aug 11 '15 at 13:54

0

Try wrapping the date_received, expiry_date and prod_code with MAX() and the quantity with SUM(). Does that help?

SELECT batch_number, MAX(date_received), MAX(expiry_date), MAX(prod_code), SUM(quantity) FROM scheme.stquem GROUP BY batch_number

answered Aug 11 '15 at 13:54

Rico Humme

456
5
12

Why are you assuming they want the `MAX` or `SUM` aggregates of the columns? Nothing in their question indicates that. – Siyual Aug 11 '15 at 13:56
Because if you want to group by just one field, there is no denying the fact you have to use aggregates. It doesn't have to be MAX or SUM. – Rico Humme Aug 11 '15 at 14:05
You're assuming he wants to aggregate the results. The OP is asking for distinct rows. He's confusing `GROUP BY` with `DISTINCT`. – Siyual Aug 11 '15 at 14:10
As he already said in his comment, when using distinct for the selection he gave, he might as well leave out the distinct, because it will return all the rows. Distinct is row based and not column based. It will only work when you request 1 column from the database, which is not the case here. – Rico Humme Aug 11 '15 at 14:22
@RicoHumme, this is the nearest to what has just worked for me. See the edit. Thanks Rico – matt Aug 11 '15 at 14:45

Select distinct on an item and return the additional fields associated

2 Answers2