1

In short, when I do the following SQL command:

SELECT EXTRACT(YEAR FROM pub_date) AS year FROM news_stories

I get a big list of years from 2008-2014.

But if I do

SELECT EXTRACT(YEAR FROM pub_date) AS year FROM news_stories WHERE year > 2010

I get an empty result. What am I doing wrong?

Dan Goodspeed
  • 3,484
  • 4
  • 26
  • 35

3 Answers3

3

Output columns are visible in the ORDER BY or GROUP BY clause, because those are applied after evaluating expressions in the SELECT clause. They are not visible in the WHERE or HAVING clause. So you cannot reference the output column name year. You would have to repeat the expression based on input columns. Obviously, there is an input column of the same name, or you would get an exception. Details:

For your query to be fast, you should rather use a sargable predicate:

SELECT EXTRACT(YEAR FROM pub_date) AS year
FROM   news_stories
WHERE  pub_date >= '2010-1-1'::date;

This is generally faster, because Postgres can compare values in the pub_date to the given value directly, without extracting the year from every row first.
More importantly, a plain index on pub_date can be used this way - if Postgres expects that route to be faster (index-only scan or selective enough).

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • So the tl;dr I took from that is "WHERE pub_date >= '2010-1-1'::date" is faster than "WHERE EXTRACT(YEAR FROM pub_date) > 2010"? I just did some AB speed testing and saw no change (the script takes 3-5 seconds to run in either case)… but if you say it's better to go the sargable method, I'm good with that. :-) Thanks! – Dan Goodspeed Dec 08 '14 at 06:27
  • @DanGoodspeed: You'll see a big difference with an index on `pub_date` in place when you select a small part of the table or if pg can use an index-only scan ([example](http://stackoverflow.com/questions/24244026/optimize-groupwise-maximum-query/24377356#24377356)). When large parts of the table have to be read, pg will use a sequential scan and you'll hardly see a difference. Either way, 3-5 sec seems *very* long. Is your table huge? How do you measure? What's your version of pg? I would use `EXPLAIN ANALYZE` to test. You don't want to include network overhead in your tests. – Erwin Brandstetter Dec 08 '14 at 06:41
  • It was a very unscientific test. It's a perl script which does a few database calls run through the web (this being one of them), and I just timed the page-load. – Dan Goodspeed Dec 08 '14 at 07:37
1

Try this. You cannot use thealias namein where clause. The WHERE clause is processed before the SELECT clause(*). Since you have Column named Year you didn't get a Error.

SELECT EXTRACT(YEAR FROM pub_date) AS year 
FROM news_stories 
WHERE EXTRACT(YEAR FROM pub_date) > 2010
Pரதீப்
  • 91,748
  • 19
  • 131
  • 172
  • Thank you. I thought one of the main purposes of AS was to be able to shorten some of these SQL statements like how I wrote it. – Dan Goodspeed Dec 07 '14 at 03:18
  • @DanGoodspeed ALIASES can be used to create a **temporary** name for columns or tables. If you want use the alias name in `where Clause` then check Gordan answer. – Pரதீப் Dec 07 '14 at 03:58
1

You must have a column named year in the database, because otherwise the SQL would return an error. YOu cannot use an alias in the select or where clause where it is defined.

Either use a subquery or repeat the expression:

SELECT EXTRACT(YEAR FROM pub_date) AS year
FROM news_stories
WHERE EXTRACT(YEAR FROM pub_date) > 2010;

Or as a subquery:

select *
from (SELECT EXTRACT(YEAR FROM pub_date) AS year
      FROM news_stories
     ) n
where year > 2010;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786