SQL: Unknown column 'new_id' in 'where clause'

Question

Where I run a query like below -
select CONCAT(id, rev) as new_id from DOCS where new_id>1
i am getting below error -

Unknown column 'new_id' in 'where clause'

Why does not sql allow to use the column alias name in the where clause. Any strong reason for that. I expect it should be smart enough to run an outer query again based on the alias after it is done with processing/filtering/aggregation based on the original column names.

I am instead required to do the below "modified query"-

select * from (select CONCAT(id, rev) as new_id from DOCS) as t where new_id>1

Is there a better approach then the modified query.

Note: I agree -

select CONCAT(id, rev) as new_id, new_id+1 as latest_id from DOCS where new_id>1 could be little confusing to the engine, but even that seems possible.
select CONCAT(id, rev) as new_id from DOCS where new_id>1 and DOCS.rev = 1 - seems confusing.

Are these the reasons or there is more to it.

Its because of the order in which things get done in running the query, live with it [`select CONCAT(id, rev) as new_id from docs where CONCAT(id, rev)>11`](http://sqlfiddle.com/#!9/d361ff/573) — RiggsFolly, Dec 30 '22 at 20:43
In MySQL - use not WHERE but HAVING. In Postgre - Use not column name but its expression. — Akina, Dec 30 '22 at 20:44
To point made by @RiggsFolly see [Select](https://www.postgresql.org/docs/current/sql-select.html) *Description* section for general order in which the parts of a `select` query are processed. — Adrian Klaver, Dec 30 '22 at 20:50
You could use having clause. It's all about order operation and performance. the columns and selected values are unknown to the where clause. The where clause can used indexd values to limit the data quickly the select often manipulates the data and is unable to therefore use an index. You could work around it by as you have, wrap the query in a query, use a CTE, or use the HAVING clause instead of the where. The having executes after the data has been selected. See: https://stackoverflow.com/questions/53143613/is-there-any-specific-order-of-execution-in-sql-query — xQbert, Dec 30 '22 at 20:57
Granted, that example was for SQL Server; but may RDBMS parse in the same logical order. — xQbert, Dec 30 '22 at 21:01
"*Why does not sql allow to use the column alias name in the where clause*" - because this is what the SQL standard defined about 40 years ago. There are only two solutions: repeat the expression in the WHERE clause or use a derived table. — , Dec 30 '22 at 22:20
@Hogan It seems that I have said clearly enough - "**In MySQL**". http://sqlfiddle.com/#!9/d361ff/575 — Akina, Dec 31 '22 at 07:28
@Akina -- this is something only mysql does. All other SQL platforms work different. Showing mysql doing something does not make it correct for all platforms. — Hogan, Jan 02 '23 at 23:28
@Akina -- that might be true (and shows why tagging with a lot of platforms is a bad idea) but just because an answer is tagged with a platform my choice is to give the platform independent answer as a best practice. Search engines give fuzzy results and you help more people with non-platform specific answers. — Hogan, Jan 04 '23 at 20:20

score 1 · Answer 1 · answered Dec 30 '22 at 20:42

1

Why not just do this:

select CONCAT(id, rev) as new_id 
from DOCS 
where CONCAT(id, rev) > 1

This will work the way you expected your original code to work in terms of performance CONCAT(id, rev) will only be calculated once per row.

answered Dec 30 '22 at 20:42

Hogan

69,564
10
76
117

score 0 · Answer 2 · answered Dec 30 '22 at 21:20

I expect it should be smart enough to run an outer query again based on the alias after it is done with processing/filtering/aggregation based on the original column names

I suggest you carefully and thoughtfully read that sentence you wrote on your question. You expect the DBMS to create a query to select ALL the possible rows (selectively) excluding WHERE clauses, then assemble a resultset with your aliased, calculated, aggregated, etc... columnns, and then run ANOTHER query on said resultset to apply the WHERE clause so you can use the alias on it.

Both DBMS and Database Admins have come a LONG way trying to run queries as smooth and efficient as possible; what you "expect" the database to do is actually a very slow and innefficient method to run that query.

In other words: you cannot do what you want not because the DBMS is not "smart enough" to filter based on an alias, but because it IS smart enough to first run the JOIN and WHERE clauses so it can reduce the read/recovered rows as much as possible, THEN process those rows to create the resultset exactly as specified on the SELECT clause.

**i am a RDBMS/SQL admirer as-well**, i don't mean to be though with that statement. But I only hinted - programmatically it's possible but it could not be optimized way to run the query like you hinted and that answers. — samshers, Dec 30 '22 at 21:37
@samshers -- there are even more issues that Josh did not touch on. Consider a system with partitioned or sharded data store. In this case the pre-result query would be partitioned and sent to many machines to resolve and then merged back together before names of the select list are resolved. If it had to not apply the where conditions before this parallelization the operation would be incredibly slow for most cases. — Hogan, Jan 04 '23 at 20:24

SQL: Unknown column 'new_id' in 'where clause'

2 Answers2