1

This is i think a simple problem but i can't seem to solve it.

I want to select the newest result from a table and join it with a single element in another table.

To put it better, here's a simple schema:

Table 1 - Person
personId -PK - INT - AUTO ID
name - VARCHAR

Table 2 - Event
eventId - PK - INT - AUTO ID
personId - FK
timestamp  - DATETIME
event - ENUM ('Went Out', 'Came back')

What I'd like to do is return a list of all people and the latest action each person performed

Example Result:

name| personId | timestamp | eventId | event

bob | 1 | 2011-08-7 3 | 'went out'

I did a simple query joining the two tables and then did a group by personId and order by timestamp but the result that was returned was always the first action for the person, not their latest.

Any Ideas?

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
Ciaran Fisher
  • 1,006
  • 9
  • 16
  • This may be a rather obvious question, but: You were ordering the timestamp in DESCENDING order, right? – Anthony Grist Oct 19 '11 at 13:07
  • 2
    Added the `[greatest-n-per-group-tag]`. Follow the links on the right, under the **Related** header. Or this: http://stackoverflow.com/questions/tagged/greatest-n-per-group+mysql – ypercubeᵀᴹ Oct 19 '11 at 13:08
  • 1
    And an excellent explanation here: http://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column/7745635#7745635 – ypercubeᵀᴹ Oct 19 '11 at 13:11

5 Answers5

3
SELECT p.name, p.personId, e.timestamp, e.eventId, e.event
FROM person p
  INNER JOIN Event e 
    ON e.eventId = 
      ( SELECT MAX(eventId)
        FROM Event 
        WHERE personId = p.personId 
        GROUP BY personId 
        LIMIT 1 )

OR

SELECT p.Name, p.ID, me.timestamp, me.ID, me.event
FROM person p
INNER JOIN (
            SELECT id, timestamp, event
            FROM Event 
            WHERE personId = p.ID               
            ORDER BY timestamp DESC LIMIT 1
           ) me
ON p.ID = me.id

PS: sorry but can't test both queries right now

sll
  • 61,540
  • 22
  • 104
  • 156
  • No need for `MAX(ID)`. You can have `ORDER BY whatever LIMIT 1` – ypercubeᵀᴹ Oct 19 '11 at 13:13
  • If 'Id' is a (guaranteed) incrementing number and it's indexed then the most efficient way to query the database would be Max(Id). This is called a 'Covered Key', I think. I don't wish to argue with the logic of this answer but there is a little trick here you might want to consider. – Hugh Jones Oct 19 '11 at 13:24
  • @Hugh Jones : initially I've used MAX(ID) but then changed. Interesting to ask `ypercube` why he suggested this, really I did not paid much attention to this, my bad anyway – sll Oct 19 '11 at 13:30
  • tried both queries, the first query went mad and returned every person linked to every event, which makes no sense as the WHERE wouldn't have been true... the second i couldn't get to work as it couldn't find p.ID – Ciaran Fisher Oct 19 '11 at 13:39
  • @Ciaran Fisher : I believe you got the idea so please tweak it slightly if required or provide more infor regarding what is not working, anyway I have not DB in hand to check queries right now – sll Oct 19 '11 at 13:41
  • @Hugh: If `id` and `timestamp` define the same order, yes, `ORDER BY id` or `MAX(id)` would be faster then `ORDER BY timestamp`. The `ORDER BY id LIMIT 1` has an advantage over `MAX(ID)` as you can show the whole row, as in the first query, without need for a subquery. – ypercubeᵀᴹ Oct 19 '11 at 13:42
  • @ypercube : agreed. I reckon 'slightly faster' will turn out to be an understatement, though. – Hugh Jones Oct 19 '11 at 13:48
  • @Ciaran: Try again the first query. It should work correctly now. There was probably a mixup of the 2 ids. – ypercubeᵀᴹ Oct 19 '11 at 13:57
  • @Hugh: I already corrected that to just 'faster'. But it won't be much of a difference as only 1 result is needed to be read from the index. The differnce comes when you want to do `LIMIT x` and the clustered index (which is usually the id) results in sequential readings from the disk. – ypercubeᵀᴹ Oct 19 '11 at 14:00
  • Thats for your help man, the first query works now but it's a lot slower than the answer i accepted. The query is working on a dataset of 85,000 events and around 500 people. – Ciaran Fisher Oct 19 '11 at 14:12
  • @Ciaran Fisher : I believe this is because ORDER BY, I'll change to MAX(ID) as in initial my answer, it should be faster – sll Oct 19 '11 at 14:26
  • @Ciaran Fisher : Do please tell us if/how much faster the Max(Id) technique is :) – Hugh Jones Oct 19 '11 at 14:43
  • @HughJones - Results are very interesting! With Max(id) the results are pretty much identical between both queries over 10 runs – Ciaran Fisher Oct 19 '11 at 17:00
  • @Ciaran : ok - then go with the safer option (but dont tell ypercube he was right ;P) – Hugh Jones Oct 20 '11 at 09:08
3
SELECT 
  t1.Name,
  t1.PersonId,
  t2.TimeStamp,
  t2.EventId,
  t2.Event
FROM Table1 t1
INNER JOIN Table2 t2 ON t2.PersonId = t1.PersonID
INNER JOIN (SELECT
              PersonId,
              MAX(TimeStamp) as LastEventDateTime
            FROM Table2 
            GROUP BY PersonID) LE 
  ON LE.PersonID = t2.PersonID 
    AND LE.LastEventDateTime = t2.TimeStamp
John Hartsock
  • 85,422
  • 23
  • 131
  • 146
0

you'd want to do an

ORDER by `timestamp` DESC

(desc from descending) to get the highest timestamp value instead of the lowest

Nanne
  • 64,065
  • 16
  • 119
  • 163
0

The ANSI standard way would be:

select name, personid, timestamp, eventid, event
from person
join event on event.personid=person.personid
  and event.timestamp=(select max(timestamp) from event e2
                       where e2.personid=person.personid)

I haven't used MySQL in a while and I don't have an installation handy, but you might get what you want with:

select name, personid, timestamp, eventid, event
from person
join event on event.personid=person.personid
group by personid
order by personid, timestamp desc

It's non-standard because by the standard, anything in the select must be in the group-by or be an aggregate, and here we don't want to do either. But as I recall MySQL doesn't require that, so I'd give this a whirl and see what happens.

Jay
  • 26,876
  • 10
  • 61
  • 112
  • the bottom query is what i actually tried first but for some reason it always results in the first event the person performed instead of the latest – Ciaran Fisher Oct 19 '11 at 13:38
  • @Jay: your second query will not work as expected, not even in MySQL. – ypercubeᵀᴹ Oct 19 '11 at 13:49
  • As for standards, you can have a look at this: http://stackoverflow.com/questions/7594865/why-does-mysql-add-a-feature-that-conflicts-with-sql-standards/7596265#7596265 – ypercubeᵀᴹ Oct 19 '11 at 14:03
  • Okay, disregard the second query. As I said, I don't have a MySQL install handy so I couldn't test it, and it would bounce with a syntax error in standard SQL. – Jay Oct 24 '11 at 17:34
0

An alternative solution, making use of a covered key, assumes that order by Id would yield the same results as order by timestamp

SELECT p.Name, p.ID, me.timestamp, me.ID, me.event
FROM person p
JOIN (
      SELECT personId, MAX(ID) id 
      FROM Event 
      WHERE personId = p.ID               
      GROUP BY personId
     ) me
ON p.ID = me.id

Order by timestamp is more natural and probably safer, but this is quicker.

Hugh Jones
  • 2,706
  • 19
  • 30