0

I have trying to get the latest results using SQL. I searched the website and found this old post from 'stackoverflow', link:

SQL query to get most recent row for each instance of a given key

I copied the most accepted answer here.

Select u.[username]
      ,u.[ip]
      ,q.[time_stamp]
From [users] As u
Inner Join (
    Select [username]
          ,max(time_stamp) as [time_stamp]
    From [users]
    Group By [username]) As [q]
On u.username = q.username
And u.time_stamp = q.time_stamp

The thing I do NOT understand is why can't we simply use below (ie: why do need the 'inner join' operation in this case)?

Select username, ip, max(time_stamp) as time_stamp
    From users
    Group By username
Community
  • 1
  • 1
wildcolor
  • 564
  • 1
  • 9
  • 23
  • 3
    (1) Tag your question with the database you are using. (2) It doesn't work. In almost all databases, it returns a syntax error because `ip` is not included in the `GROUP BY`, so the SQL engine doesn't know which value to choose (MySQL accepts the syntax but plugs in a value from an indeterminate row). – Gordon Linoff Apr 27 '16 at 12:22
  • 1
    The general GROUP BY rule says: If a GROUP BY clause is specified, each column reference in the SELECT list must either identify a grouping column or be the argument of a set function. – jarlh Apr 27 '16 at 12:23

2 Answers2

3

Most SQL variants (except for MySQL I believe) require that any non-aggregated columns in your SELECT list also be in your GROUP BY. Otherwise, for a single username with multiple ip values, which one should appear in the results? In this case you might know that the username/ip pairs are unique, but the SQL engine might not always know that, especially for more complex queries.

To clarify with an example:

Users:

username      ip              timestamp
--------      --------------  ---------
bob           167.49.122.122  2016-01-05
john          167.49.122.123  2016-02-02
bob           167.49.122.124  2016-04-01

What result would you expect to see? for "bob"

Should it be bob/167.49.122.122/2016-04-01? bob/167.49.122.124/2016-04-01? Both?

You might say, "Well, obviously I want the whole row that has the max date", but that's not obvious from your second SELECT even if it seems intuitive to a human.

Tom H
  • 46,766
  • 14
  • 87
  • 128
  • 2
    Good answer. I'd add that if you know for a fact that the username/ip pairs are unique, you could use your simple query with `GROUP BY username, ip`. Otherwise you have to use one of the more complex ways to get the right value for the ip. – jussius Apr 27 '16 at 12:36
3

Your syntax suggests SQL Server (or perhaps MS Access, but I'll be optimistic). A typical approach in ANSI-standard SQL is to use row_number():

select u.*
from (select u.*,
             row_number() over (partition by username order by time_stamp desc) as seqnum
      from users u
     ) u
where seqnum = 1;

There are definitely other methods as well that do not require an explicit join.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786