[AWS Athena]: How to use DISTINCT with SELECT ALL, and ORDER BY

Asked Mar 13 '20 at 19:41

Active Mar 15 '20 at 19:23

Viewed 2,280 times

I'm using AWS Athena, I have a query as such:

SELECT * FROM foo ORDER BY purchase_date ASC

But I want to de-dupe the records.

Since this is Athena, and it needs to process records in parallel, I'm not sure how to write the DISTINCT clause.

How can I make this query so that it doesn't have any duplicate records in the result set?

Thanks

edited Mar 15 '20 at 19:23

shuvalov

asked Mar 13 '20 at 19:41

Dustin Anderson

just use `select distinct c1,c2...from tbl` – Vamsi Prabhala Mar 13 '20 at 19:49
@VamsiPrabhala that doesn't work because the nature of AWS Athena is such that it's streaming records and becuase I have an ORDER BY clause, it somehow needs all the records... anyway let me try what you're saying thank you very much – Dustin Anderson Mar 13 '20 at 19:52
that implementation detail you shouldn't worry about – Vamsi Prabhala Mar 13 '20 at 19:55
@VamsiPrabhala so I did SELECT DISTINCT foo, bar from tbl but I got only the columns foo and bar in the result set. How to get all columns? – Dustin Anderson Mar 13 '20 at 20:57
1

This seems to work: SELECT * FROM ( SELECT *, row_number() over (partition by foo) as row_number FROM tbl ) AS ROWS WHERE row_number = 1 order by bar which I got from: https://stackoverflow.com/questions/12693089/pgerror-select-distinct-order-by-expressions-must-appear-in-select-list – Dustin Anderson Mar 13 '20 at 21:53

0 Answers0