0

I have a table of information about commercial airline flights, which includes, among other things, the city where a given flight originates (origin_city) and the flight time (actual_time).

I'm trying to understand why a (perhaps naively) simple query does not return the originating city for the flight with the longest flight time.

The following query:

SELECT FLIGHTS.origin_city, MAX( FLIGHTS.actual_time ) as [time]
FROM FLIGHTS

results in the error:

Column 'FLIGHTS.origin_city' is invalid in the select list because
it is not contained in either an aggregate function or the GROUP BY
clause.

The way I would think that SQL would interpret this query is that it would find the row that has the maximum flight time and then display the origin_city and the actual_time columns for that row.

Why doesn't that work?

If I leave the origin_city out of the SELECT, then the query runs fine:

SELECT MAX( FLIGHTS.actual_time ) as [time]
FROM FLIGHTS

If I group the flights by origin_city, then SQL doesn't have any issues pulling both the origin_city and the actual_time from each row.

SELECT FLIGHTS.origin_city, MAX( FLIGHTS.actual_time ) as [time]
FROM FLIGHTS
GROUP BY FLIGHTS.origin_city

Note that I am not looking for a query that works; I am trying to understand why my naive query at the beginning of the post doesn't work . . . :-P

Carl Parker
  • 1,005
  • 1
  • 8
  • 10
  • 1
    looks like your question is same as this , see if it helps https://stackoverflow.com/questions/13998552/why-do-we-need-group-by-with-aggregate-functions – Karthick Trichy Chandrasekaran Jun 04 '21 at 16:54
  • 1
    If the query has a `GROUP BY` clause, all columns not included in this clause should show up aggregated (as in `SUM()`, `MIN()`, `MAX()`, `AVG()` etc. The exception of the rule is when the other columns are directly dependent on the key included in the `GROUP BY` clause. – The Impaler Jun 04 '21 at 16:55

2 Answers2

2
SELECT FLIGHTS.origin_city, MAX( FLIGHTS.actual_time ) as [time]
FROM FLIGHTS

The way I would think that SQL would interpret this query is that it would find the row that has the maximum flight time and then display the origin_city and the actual_time columns for that row.

I find that interpretation silly. How do you propose that SQL handle these queries?

SELECT FLIGHTS.origin_city, MAX(FLIGHTS.actual_time) as [time],
       MIN(FLIGHTS.actual_time)
FROM FLIGHTS;

Or:

SELECT FLIGHTS.origin_city, AVG(FLIGHTS.actual_time) as [time]
FROM FLIGHTS;

Not so obvious, eh? I suppose you could make some "special case" interpretation if there is one aggregation function and it is MIN() or MAX() but not any other. In fact, SQLite does that -- to the detriment of people who learn aggregation through that database.

Further, there is a real easy way to do what you want:

SELECT FLIGHTS.origin_city, FLIGHTS.actual_time
FROM FLIGHTS
ORDER BY FLIGHTS.actual_time DESC
OFFSET 0 ROW FETCH FIRST 1 ROW ONLY;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

This might seem as though it has a simple answer... however... the simple answer as to to why your first query can't give you what you want is that you never told it to group on the FLIGHTS.origin_city and so it doesn't know what to do with that field. You added an Aggregate function of MAX but you also added another field... so now it's not only looking at FLIGHTS.actual_time but also FLIGHTS.origin_city and so now when we group the FLIGHTS.origin_city the query knows you are looking to see the MAX actual_time per each origin_city.

The query doesn't know which city you are needing a MAX(FLIGHTS.actual_time). If you want to know a specific origin_city and its MAX actual_time you can add a filter to the WHERE clause to specify which origin_city you want to see... otherwise by adding the origin_city to the GROUP BY Clause this now tells the query you wish to see the MAX actual_time per each city.

If it's not added to the GROUP BY clause the SQL doesn't know how to use MAX.

Code Novice
  • 2,043
  • 1
  • 20
  • 44