1

This question comes as a result of a limitation in Amazon Redshift, the columnar analytics database based on Paraccel. One of the unsupported features is references in a GROUP BY column to the results of a correlated subquery.

For example, the following statement will generate a Redshift error because GROUP BY uses list which is generated by a subquery.

select listing.listid,
(select count (sales.listid) from sales where sales.listid=listing.listid) as list
from listing
group by list, listing.listid; 

The following example from Gordon Linoff is another unsupported use case (an answer to a specific question that generated this general question).

select type, (case when cnt > XXX then url end) as url, sum(cnt) as visit_cnt
from (select type, url, count(*) as cnt
      from t
      group by type, url
     ) t
group by type, url
order by type, sum(cnt) desc;

The purpose of this question is to find a generic pattern for overcoming this particular Amazon Redshift correlated subquery limitation. What are the alternative SQL patterns for achieving the same outcome as using values from correlated subqueries?

Community
  • 1
  • 1
Sim
  • 13,147
  • 9
  • 66
  • 95
  • For folks unfamiliar with Redshift: Assume you're working with roughly a PostgreSQL 8.1/8.2 feature set, plus a few extra features and extra limitations. No CTEs or window functions, no `unnest` and `array_agg`, etc. (BTW, it'd be great if there was a Redshift SQLFiddle). – Craig Ringer Dec 29 '13 at 01:41
  • 2
    You do have Window functions: http://docs.aws.amazon.com/redshift/latest/dg/c_Window_functions.html – Guy Dec 29 '13 at 07:07
  • +1 @Guy: there are indeed window functions. They do have limitations, though. – Sim Dec 30 '13 at 00:49

1 Answers1

1

A left join should do the trick, unless I'm missing something.

SELECT listing.listid
      ,COUNT(sales.listid)
FROM      listing
LEFT JOIN sales
       ON sales.listid = listing.listid
GROUP BY listing.listid
ORDER BY COUNT(sales.listid) DESC
; 
Joe Harris
  • 13,671
  • 4
  • 47
  • 54