0

This is related to Get records with max value for each group of grouped SQL results, except that the necessary groups are in a separate table.

Let's say that I have a couple of buildings, each building has some offices, and I have some people who "check in" to those offices to do work.

This table, called "offices", matches buildings to offices:

building   office
---
Seuss      Yertle
Seuss      Cubbins
Milne      Pooh
Milne      Eeyore
Milne      Roo

This table, called "checkins", records when people worked in each office:

id   office   person   timestamp
---
1    Yertle   Aaron    100
2    Cubbins  Aaron    200
3    Pooh     Aaron    300
4    Cubbins  Charlie  300
5    Cubbins  Aaron    700
6    Eeyore   Beth     600
7    Pooh     Beth     400

I'd like to generate a table that tells me, for each building-person combo, which check-in was the most recent one for that person in that building:

 building  person   checkin_id  office   timestamp
 ---
 Seuss     Aaron    5           Cubbins  700
 Milne     Aaron    3           Pooh     300
 Milne     Beth     6           Eeyore   600
 Seuss     Charlie  4           Cubbins  300

I'm at a loss for how do I do this. The standard trick involves joining a table to itself while comparing the relevant value, then throwing away the rows where there is no bigger value. I assume I'll need two copies of "checkins" and two copies of "buildings" with a complex join between them, but I can't seem to get the NULLs to show up in the correct place.

I'm using MySQL, if that helps.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
onigame
  • 214
  • 3
  • 10

3 Answers3

1

There is a trick using group_concat() and substring_index() that allows you to do this with a single group by:

select o.building, c.person,
       max(c.id) as checkinid,
       substring_index(group_concat(c.office order by timestamp desc), ',', 1) as office,
       max(c.timestamp) as timestamp
from offices o join
     checkins c
     on o.office = c.office
group by o.building, c.person;

This version assumes that id and timestamp increase together, so max() can be used for both.

Also, group_concat() -- by default -- has a limit of about 1,000 characters for the intermediate result, so this won't work if there are lots and lots of offices for a person/building combination or if the offices have long names. Of course, the separator can be changed if commas appear in the office name.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • It's a very smart solution. To avoid the group_concat() limitation, it's possible to change it before with a `SET GLOBAL group_concat_max_len = 1000000;`, no ? – Daniel E. Jan 14 '19 at 14:00
  • @DanielE. . . . Yes. You can readily change the value from the default of 1024 (https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_group_concat_max_len). – Gordon Linoff Jan 14 '19 at 14:14
  • Nice. Doesn't work for my scenario because there's no guarantee that id and timestamp increase together -- id correlates to when the event was logged on the server, timestamp represents when the event actually happened. – onigame Jan 14 '19 at 21:26
  • @onigame . . . You can actually use the same trick on `id`. `max(id)` is just simpler if it increases with the `timestamp` -- which is often the case. – Gordon Linoff Jan 14 '19 at 23:22
0

use corelated subquery

 select b.* from
  (select o.building,c.person,c.id as checkinid,
   c.office,c.timestamp from
   offices o join checkins c
   on o.office=c.office
  ) b
  where b.timestamp = (select max(a.timestamp)
                      from (
                        select o.building,c.person,
                        c.office,c.timestamp from
                        offices o join checkins c
                        on o.office=c.office
                         ) as a  where a.building=b.building and  
                     a.person=b.person 
                   )
       order by person

output

building    person  checkinid   office  timestamp
Milne       Aaron     3         Pooh     300
Seuss       Aaron     5         Cubbins  700
Milne       Beth      6         Eeyore   600
Seuss       Charlie   4         Cubbins  300
Zaynul Abadin Tuhin
  • 31,407
  • 5
  • 33
  • 63
0

To get the result you want, you have to create a table which has all checkins in each office (JOIN offices to checkins) and then JOIN that to a table of the latest checkin for each person in each building:

SELECT o.building, c.person, c.id AS checkin_id, c.office, c.timestamp
FROM offices o
JOIN checkins c ON c.office = o.office
JOIN (SELECT o.building, c.person, MAX(c.timestamp) AS timestamp
      FROM offices o
      JOIN checkins c ON c.office = o.office
      GROUP BY o.building, c.person) t ON t.building = o.building AND t.person = c.person AND t.timestamp = c.timestamp
ORDER BY c.person, c.office

Output:

building    person      checkin_id  office      timestamp
Seuss       Aaron       5           Cubbins     700
Milne       Aaron       3           Pooh        300
Milne       Beth        6           Eeyore      600
Seuss       Charlie     4           Cubbins     300

Demo on dbfiddle

Nick
  • 138,499
  • 22
  • 57
  • 95