1

I'm struggling with what should be a simple query.

An event table stores user activity in an application. Each click generates a new event and datetime stamp. I need to show a list of recently accessed records having the most recent datetime stamp. I need to only show the past 7 days of activity.

The table has an auto-increment field (eventID), which corresponds with the date_event field, so it's better to use that for determining the most recent record in the group.

I found that some records are not appearing in my results with the expected most recent datetime. So I stripped my query down the basics:

NOTE that the real-life query does not look at custID. I am including it here to narrow down on the problem.

        SELECT
            el.eventID,
            el.custID,
            el.date_event
        FROM  
            event_log el
        WHERE 
            el.custID = 12345 AND
            el.userID=987
        GROUP BY  
            el.custID
        HAVING
            MAX( el.eventID )

This is returned:

eventID     custID  date_event
346290      12345   2013-06-21 09:58:44

Here's the EXPLAIN

id  select_type     table   type    possible_keys               key     key_len     ref     rows    Extra
1   SIMPLE          el      ref     userID,custID,Composite     custID  5           const   203     Using where

If I change the query to use HAVING MIN, the results don't change.. I should see a different eventID and date_event, as there are dozens of records matching the custID and userID.

        SELECT
            el.eventID,
            el.custID,
            el.date_event
        FROM  
            event_log el
        WHERE 
            el.custID = 12345 AND
            el.userID=987
        GROUP BY  
            el.custID
        HAVING
            MIN( el.eventID )

Same results as before:

eventID     custID  date_event
346290      12345   2013-06-21 09:58:44

No change.

This tells me I have another problem, but I am not seeing what that might be.

Some pointers would be appreciated.

a coder
  • 7,530
  • 20
  • 84
  • 131
  • 3
    Your usage of `group by` is invalid SQL which is rejected by all other DBMS. But MySQL accepts it and simply choses to return random data to you. Read this http://rpbouman.blogspot.de/2007/05/debunking-group-by-myths.html and this http://www.mysqlperformanceblog.com/2006/09/06/wrong-group-by-makes-your-queries-fragile/ to understand why –  Jun 28 '13 at 20:41
  • @Aiias, ORDER BY is applied to results after any filtering is done. Won't work. – a coder Jun 28 '13 at 20:43
  • 2
    Also, because you're not comparing the `HAVING MIN( el.eventID )` to a value, it's saying the same thing as `el.eventID IS NOT NULL`. – Ed Gibbs Jun 28 '13 at 20:51
  • possible duplicate of [Retrieving the last record in each group](http://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group) – Barmar Jun 28 '13 at 21:15

2 Answers2

3
SELECT
    el.eventID,
    el.custID,
    el.date_event
FROM  
    event_log el
WHERE 
    el.custID = 12345 AND
    el.userID=987 AND
    el.eventID IN (SELECT MAX(eventID)
                   FROM event_log
                   WHERE custID = 12345
                   AND userID = 987)

Your query doesn't work because you misunderstand what HAVING does. It evaluates the expression on each line of the result set, and keeps the rows where the expression evaluates to true. The expression MAX(el.eventID) simply returns the maximum event ID selected by the query, it doesn't compare the current row to that event ID.

Another way is:

SELECT
    el.eventID,
    el.custID,
    el.date_event
FROM  
    event_log el
WHERE 
    el.custID = 12345 AND
    el.userID=987
ORDER BY eventID DESC
LIMIT 1

The more general form that works for multiple custID is:

SELECT el.*
FROM event_log el
JOIN (SELECT custID, max(date_event) maxdate
      FROM event_log
      WHERE userID = 987
      GROUP BY custID) emax
ON el.custID = emax.custID AND el.date_event = emax.maxdate
WHERE el.userID = 987
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • There are multiple rows for the passed custID/userID. The second query would return one row (correct data), however the actual query is custID non-specific. The second example would only work where there is one row (custID) for that userID. In real life, there would be dozens. The first example (above) works as expected, but I haven't modified it to remove the custID criteria. – a coder Jun 28 '13 at 21:10
  • How can the second query return all rows when it uses `LIMIT 1`? – Barmar Jun 28 '13 at 21:12
  • Things get more complicated when you want to do this for multiple custIDs -- the right way is with a join. There are lots of SO questions that show the way to get the maximum row for each group. – Barmar Jun 28 '13 at 21:14
  • I am using a join and it works perfect now. Thanks. ` SELECT ev1.eventID, ev1.custID, ev1.date_event FROM event_log ev1 LEFT JOIN event_log ev2 ON ev1.emory_id = ev2.emory_id AND ev1.userID = ev2.userID AND ev1.date_event < ev2.date_event WHERE el.custID = 12345 AND el.userID=987 AND ev2.emory_id IS NULL` – a coder Jun 28 '13 at 21:30
  • Still a minor problem when I remove the custID clause (two rows for each custID are returned), but I think I'm good from here. thanks again. – a coder Jun 28 '13 at 21:34
  • See the more general query I just added. – Barmar Jun 28 '13 at 21:38
  • +1 for your explaination of HAVING. I misunderstood HAVING as a pre-selector of the result set prior to GROUP BY. Now I've learned I can use WHERE to filter data set prior to GROUP BY. – Steve Wasiura Aug 02 '13 at 13:45
0

You can use a group function in a statement containing no GROUP BY clause, but it would be equivalent to grouping on all rows. But I guess you're looking for the common syntax,

SELECT
  MIN(el.eventID) AS `min_eventID`, --> Yes it is wrong :(
  el.custID,
  el.date_event
FROM  
  event_log el
WHERE 
  el.userID = 987
GROUP BY el.custID;

But disagreements are welcome .


[ Edit ]

I think I didn't show a solution fast enough... but maybe you're rather looking for the fastest solution.
Assuming field date_event defaults to CURRENT_TIMESTAMP (am I wrong?), ordering by date_event would be a waste of time (and money, thus).
I've made some tests with 20K rows and execution time was about 5ms.

SELECT STRAIGHT_JOIN y.*
FROM ((
  SELECT MAX(eventId) as eventId
  FROM event_log
  WHERE userId = 987 AND custId = 12345
)) AS x
INNER JOIN event_log AS y
  USING (eventId);

Maybe (possibly, who knows) you didn't get the straight_join thing; as documented on the scriptures, STRAIGHT_JOINs are similar to JOINs, except that the left table is always read before the right table. Sometimes it's useful.
For your specific situation, we're likely to filter to a certain eventID before (on table "x"), not to retrieve 99,99% useless rows from table "y".

  • More disagreements expected in 3, 2, ...
Community
  • 1
  • 1