I have a PostgreSQL table with the following relevant fields:
url
title
created_at
There can be many rows that contain identical URLs but different titles. Here are some sample rows:
www.nytimes.com | The New York Times | 2016-01-01 00:00:00`
www.wsj.com | The Wall Street Journal | 2016-01-03 15:32:13`
www.nytimes.com | The New York Times Online | 2016-01-06 07:19:08`
I'm trying to obtain an output that lists the following fields:
1) url
2) title
that corresponds to the highest value of created_at
3) count of all title
for that unique url
So, output rows for the above sample would look something like this:
www.nytimes.com | The New York Times Online | 2
www.wsj.com | The Wall Street Journal | 1
Based on the numerous SO posts I've read on similar questions, it looks like my best option for obtaining the first two fields (url
and latest title
) would be to use DISTINCT ON
:
select distinct on (url) url, title from headlines order by url, created_at desc
Likewise, to obtain the first and third fields (url
and count of all title
), I could simply use GROUP BY
:
select url, count(title) from headlines group by url
What I can't figure out is how to combine the above methodologies and obtain the above-mentioned three values I'm trying to get.
(Edited to provide more clarity.)