1

I am having a little trouble with a SQL query. I have a table to stores results against a date. I'd like to select all distinct results with the maximum (most recent) date. I can make this work with:

select distinct(r.data), max(c.committed_at) as timestamp
from results r
     inner join commits c on (r.commit_id=c.id) 
where r.repository_id=65 AND data_type='data_model'
group by data
order by timestamp DESC

This is good and all works.

However, I want to be able to select a third column, which is a hash. This hash is unique to every result row.

But when I add it in to the SELECT clause, I of course have to add it to my GROUP BY too.

select distinct(r.data), max(c.committed_at) as timestamp, c.hash
from results r
     inner join commits c on (r.commit_id=c.id) 
where r.repository_id=65 AND data_type='data_model'
group by data, c.hash
order by timestamp DESC

And this throws the query out. I no longer get distinct results (of course).

I tried an aggregate function on hash:

select distinct(r.data), max(c.long_hash), max(c.committed_at) as timestamp
from results r
    inner join commits c on (r.commit_id=c.id) 
where r.repository_id=65 AND data_type='data_model'
group by data
order by timestamp DESC

But then I don't get the maximum time, I get the maximum hash.

I feel like I just want to be able to say "find the distinct results with the maximum timestamp and then just include the hash in the result", but I can't figure out how...

Cheers, Ben

Edit: my attempt at this with DISTINCT ON is:

select distinct on (data) data, c.committed_at as committed_at, c.long_hash
from results r 
    inner join commits c on (r.commit_id=c.id) 
where r.repository_id=65 AND data_type='complexity'
order by data, c.committed_at DESC

But this requires an ORDER BY on data, which again, throws out the ordering of my results...

Edit 2:

Executing this query gives the following result:

'{"status":"supported","per_method":"10.2","total":"815.2","data"<snip>...}','2017-01-04 13:25:51','4a44bccca804c28c6a5e61b36b5ebcb716d4c11f'
'{"status":"supported","per_method":"8.8","total":"649.3","data":<snip>...}','2017-01-02 23:35:11','d747e657a81c5c6da4262a5298c3071082b2af41'
'{"status":"supported","per_method":"10.7","total":"944.3","data":<snip>...}','2017-01-08 17:28:57','ff4be5fa6dc88237e7855ed1b534baee69aa8800'

As you can see, the data column is in order (according the JSONB column rules), whereas the timestamp column is not (it goes from Jan 4th, to Jan 2nd, to Jan 8th).

benjimix
  • 609
  • 7
  • 18
  • which dbms is being used, like Oracle,MySQL,Postgres,SQL Server ... – Vamsi Prabhala Jun 27 '18 at 05:20
  • Oops! Postgres. I have tried DISTINCT ON too, but that then required an ordering by the data column, which is not what I want either... – benjimix Jun 27 '18 at 05:21
  • 2
    `distinct` is **NOT** a function. `distinct (a),b` is exactly the same thing as `distinct a,(b)` or `distinct a,b`. It is always applied to all columns in the select list –  Jun 27 '18 at 05:25
  • Thanks. I just want to avoid ordering by that column... is there any way to do that? – benjimix Jun 27 '18 at 06:16
  • Possible duplicate of [Select first row in each GROUP BY group?](https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group) – Michael Freidgeim Sep 28 '19 at 23:54

2 Answers2

2

You don't need group by with select distinct on. You just adjust the order by. So:

select distinct on (r.data) r.*, c.*
from results r inner join
     commits c
     on r.commit_id = c.id 
where r.repository_id = 65 and
      data_type = 'data_model'
order by r.data, c.committed_at desc;

distinct on does all the work you want to do. It returns one row for each combination of values in the distinct on list. The row is the first one specified by the order by.

I think this is what you intend with distinct(r.data), because distinct itself is not a function and you would not use parentheses with it.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

Try this:

with cte as
(
select r.data,c.committed_at, c.hash,timestamp, 
ROW_NUMBER() over (Partition by r.data order by c.committed_at DESC) as ranking
from results r
     inner join commits c on (r.commit_id=c.id) 
where r.repository_id=65 AND data_type='data_model' 
)
Select data,committed_at, hash 
from cte where ranking=1 order by timestamp DESC
Ajay Gupta
  • 1,775
  • 1
  • 10
  • 22