How to use a column in select statement which is not in aggregate function nor in group by clause?

Question

Above is the table and on the basis of which I have to answer the below question in my past interview.

Q. The most recent order value for each customer?

Answer which I have given in interview:

select customerID, ordervalue, max(orderdate)
from office
group by customerID;

I know since we are not using ordervalue in aggregate and nor in group by so this query will throw an error in SQL but I want to know how to answer this question.

Many times in my past interviewers asked a question where I need to use a column in select statement which is not in aggregate function or nor in group by. So I want know in general what is a workaround for it with an example so that I can resolve these type of questions or how to answer these questions.

[Please do not upload images of code/data/errors when asking a question.](//meta.stackoverflow.com/q/285551). Paste the data table as formatted text into your request instead. — Thorsten Kettner, May 01 '22 at 09:36

score 2 · Answer 1 · 2022-05-01T08:52:52.877

The work around depends on what is being asked. For the requirements you have above, I think it makes sense to create (customerid, MAX(orderdate)) pairs.

SELECT customerid, MAX(orderdate)
FROM office 
GROUP BY customerid;

Then you can use them to match the row you need from the table.

SELECT customerid, ordervalue, orderdate
FROM office
WHERE (customerid, orderdate) IN 
    (SELECT customerid, MAX(orderdate) 
     FROM office 
     GROUP BY customerid);

Note, this assumes there is only one order per customer per day. If there were more than one, you would see the most recent order(s) per customer. You could add also a GROUP BY on the outer query if needed.

SELECT customerid, MAX(ordervalue), orderdate
FROM office AS tt
WHERE (customerid, orderdate) IN 
    (SELECT customerid, MAX(orderdate) 
     FROM office
     GROUP BY customerid)
GROUP BY customerid, orderdate;

If the non-aggregate column you need in the SELECT is functionally dependent on the column in the GROUP BY, you can add a subquery in the SELECT.

We can extend your example by adding a name column, where the name of different customers could be the same. If you wanted name instead of ordervalue, just match the customerid of the outer query to get name.

SELECT customerid, 
  (SELECT name FROM office WHERE customerid=o.customerid LIMIT 1) AS name,
  MAX(orderdate)
FROM office AS o
GROUP BY customerid;

score 0 · Answer 2 · answered May 01 '22 at 09:30

You are approaching the task as follows: Aggregate all rows to get one result line per customer, showing the maximum order date and its order value. The problem with this: you'd need an aggregate function to get the value for the maximum order date. The only DBMS I know of featuring such a function is Oracle with KEEP FIRST/LAST.

So look at the task from a different angle. Don't think aggregation-wise where you could count and add up values for a group and get the minimum or maximum value over all the group's rows, because after all you just want to pick single rows. (That is, pick the top 1 row per customer.) In order to pick rows, you'll use a WHERE clause.

One option has been shown by Steve in his answer:

select *
from office
where (customerid, orderdate) in 
( 
  select customerid, max(orderdate)
  from office
  group by customerid
);

This is a good, straight-forward approach. (Some DBMS, though, don't feature tuples with IN clauses.)

Another way to get the "best" row for a customer would be to pick those rows for which not exists a better row:

select *
from office
where not exists
( 
  select null
  from office better
  where better.customerid = office.customerid
  and better.orderdate > office.orderdate
);

And then there is the option to use a window function (aka analytic function) in order to get those rows. One example is to get the maximum dates along with the rows' data:

select customerid, ordervalue, orderdate
from
(
  select
    customerid, ordervalue, orderdate,
    max(orderdate) over (partition by customerid) as max_orderdate
  from office
)
where orderdate = max_orderdate;

And with ROW_NUMBER, RANK, and DENSE_RANK there are window functions to assign numbers to your rows in the order you want. You number them such that the best rows get number 1 and pick them. The big advantage here: you can apply any order, deal with ties and not only get the top 1, but the top n rows.

select customerid, ordervalue, orderdate
from
(
  select
    customerid, ordervalue, orderdate,
    row_number() over (partition by customerid order by orderdate desc) as rn
  from office
)
where rn = 1;

How to use a column in select statement which is not in aggregate function nor in group by clause?

2 Answers2