1

I'm querying a very large activity log in a MySQL database, trying to figure out what each user was doing at midnight on a particular day. The catch is that I'm only interested in a subset of possible log messages, and I want the last log entry in that subset prior to midnight. Here's what I have so far:

SELECT user, dateAndTime, msg
FROM Log
WHERE msg in ('off', 'on', 'sleep', 'wake')
  AND logDate = '2011-12-31';
ORDER BY user ASC, dateAndTime DESC

(We have user and msg as type VARCHAR, logDate as DATE, and dateAndTime as DATETIME.)

This seems like a good first step. The results appear in groups by user name and sorted by time. Is there a way to pull off just the first row of those results for each user? I could always post-process the results, but this seems like something I should be able to do with the query itself. I'm pretty new to thinking in sets instead of procedurally, so I apologize in advance if the solution is supposed to be obvious.

Additional notes...

I've found many other questions on SO that deal with this same kind of problem, but my extra constraints are making it more difficult. I tried modifying this query, but I think the GROUP BY clause somehow lost the restriction on msg.

I'm also working under a time constraint. The queries are going through a web service (that I don't control) that automatically times out after 5 minutes. The Log table is big (almost 2 million rows just for 2011-12-31), so most of the solutions I tried involving joins just timed out.

UPDATE:

I discovered something else about the data I previously didn't know. There are multiple rows per user for a given dateAndTime.

Community
  • 1
  • 1
Michael Kristofik
  • 34,290
  • 15
  • 75
  • 125

3 Answers3

4
SELECT l.user, l.dateAndTime, l.msg
    FROM Log l
        INNER JOIN (SELECT user, MAX(dateAndTime) AS MaxDateTime
                        FROM log
                        WHERE msg in ('off', 'on', 'sleep', 'wake')
                            AND logDate = '2011-12-31'
                        GROUP BY user) q
            ON l.user = q.user
                AND l.dateAndTime = q.MaxDateTime
    WHERE l.msg in ('off', 'on', 'sleep', 'wake')
    ORDER BY l.user ASC
Joe Stefanelli
  • 132,803
  • 19
  • 237
  • 235
  • 1
    Since the OP seems to have issues with join performance, I'll just add the suggestion to check the execution plan, and check or consider indexes on logDate, user, dateAndTime. – Mike Sherrill 'Cat Recall' Jan 19 '12 at 15:55
  • So I just discovered that there can be multiple rows per user for a given `dateAndTime`. This query returns every row that matches `MaxDateTime`, not just the subset of `msg` I'm interested in. – Michael Kristofik Jan 19 '12 at 16:06
  • @Catcall, unfortunately, the query web service is my only window into the db. – Michael Kristofik Jan 19 '12 at 16:07
  • @Kristo Repeat the test for msg in an outer WHERE clause. I'll update the answer. After that, if there are still multiple, you'll need to supply an additional rule for how to choose which one to keep. – Joe Stefanelli Jan 19 '12 at 16:10
  • @JoeStefanelli, I added your additional WHERE clause and a LIMIT 1000 at the end. The new query *appears* to be working. I am seeing a few pairs where `off` and `wake` show up with the same timestamp. I'd like to choose `off` only in those cases. – Michael Kristofik Jan 19 '12 at 16:21
1

WARNING: Not guaranteed to work always, but should be fast:

select * from
(SELECT user, dateAndTime, group_concat(msg) allMsgs
 FROM Log
 WHERE msg in ('off', 'on', 'sleep', 'wake')
   AND logDate = '2011-12-31'
 GROUP BY user, dateAndTime
 ORDER BY user ASC, dateAndTime DESC) v
group by user

In MySQL, columns that are neither aggregated nor grouped can still be included in the select clause of a grouped query. Such columns are not guaranteed to be the value from any particular row within the group, according to MySQL documentation (since this is a shortcut to include values that are functionally dependant on fields that are grouped), but normally seem to be the first value accessed within the group. This functionality is susceptible to changes in the query plan, which is why all ordering takes place within the inline view - so that the grouping should take place outside the part of the query that could be changed by changes in the query plan.

  • +1 for an explanation of why `GROUP BY` was causing me problems. This *almost* works except for cases where there's more than one `msg` I care about for a given `user` and `dateAndTime`. – Michael Kristofik Jan 19 '12 at 16:30
  • @Kristo: I have amended the query so that it should now return all msgs from the same dateAndTime, for the same user, concatenated in a single field for a single row. –  Jan 19 '12 at 16:35
0
SELECT log.user, log.msg, log.dateAndTime 
FROM(
  SELECT user, msg, max(dateAndTime) as maxdatetime
  FROM Log
  WHERE msg in ('off', 'on', 'sleep', 'wake')
    AND logDate = '2011-12-31';
  GROUP BY user, msg    
) inner, Log log
WHERE log.user = inner.user 
AND log.msg = inner.msg 
AND log.dateAndTime = inner.maxdatetime
ORDER BY log.user ASC, log.dateAndTime DESC
bpgergo
  • 15,669
  • 5
  • 44
  • 68